Research
The ultimate project objective is to enable tele-immersive meetings or entertainment (e.g., online gaming) anytime, anywhere and on any device. This ambitious goal suggests that scalability, inter-operability, and global optimum are the key properties desired for the developed system and algorithms. More specifically, our targeted solution should be able to cope with various system dynamics in real-life applications, while achieving the best possible performance and functionality that promote and advance tele-immersive user experiences. These system dynamics are, for instance, as follows,
- Versatile sensory inputs: ranging from a single cell-phone webcam, stereo cameras, color plus depth sensors to hybrid multi-camera systems, even up to the latest programmable FCam.
- Different computing power: the underlying computing devices can include just smartphones, tablets, all the way to GPU-powered, multi-core desktops, FPGAs.
- Heterogeneous network QoS provisioning: e.g., brandband Vs. 3G connection, dedicated line Vs. public Internet.
- Diversified output and interaction capability: e.g., small yet touchable screen of an iPhone Vs. HD monitor connected to a powerful desktop.
Mainly focused on low-cost, commodity capturing and computing setups, the ITEM project primarily researches on three pillar areas, where the inter-connection is actively explored and interweaved. These three main research areas and some example subtopics are listed below.
- Computer vision and image understanding
- Efficient camera calibration for various sensors
- Low-resolution, noisy depth video enhancement
- Stereo matching, optical flow estimation, Structure-from-Motion (SfM)
- Video object cutout/matting, object tracking
- 3D reconstruction of objects and environment
- Video/data representation, compression, and communication
- Object-based video coding and delivery
- View synthesis-driven color-plus-depth video coding
- Multi-view, 3D video coding and delivery
- Multi-point video conferencing system
- Graphics and human-computer interactions
- Free-viewpoint image synthesis
- Computational photography, image relighting
- Non-photorealistic Rendering (NPR)
- Interactive video object manipulation
- Gesture-controlled multimedia content navigation