Sifei Liu

Nvidia Graphics Pvt Ltd

Research Scientist

Sifei Liu is a research scientist at Nvidia Research in Santa Clara, US. She received her PhD from the University of California Merced, department of EECS, where she was advised by Prof. Ming-Hsuan Yang. Before that, she obtained her master’s in ECE from University of Science and Technology of China (USTC), under the supervision of Prof. Stan.Z Li and Prof. Bin Li, and bachelor’s in control science from North China Electric Power University (NCEPU). Her research interests are in computer vision (low-level vision, semantic segmentation, 3D scene understanding), deep learning (graph-structured networks, self-supervised learning), and the combination of both. She also worked as an intern student in Baidu IDL, multimedia lab in CUHK, and NVIDIA research. She was the recipient of Baidu Graduate Fellowship in 2013, and NVIDIA Pioneering Research Award in 2017.

Research Abstract:

Reasoning relations in the visual world plays a significant role in visual understanding tasks from low-level vision such as among image pixels, to high-level vision such as between objects. Although the Convolutional Neural Networks (ConvNets) is proved to be powerful to capture abstract conceptions out of visual elements, it is still deficient in explicitly modeling relations between them. My research goal is to fill in such missing mechanism — to develop deep learning modules that can explicitly reason relations. For low- to mid-level vision tasks, e.g., semantic segmentation, the module is responsible for modeling the semantic pairwise relations between pixels. For high-level reasoning, e.g., affordance prediction, such module exploits interactions between objects.

To model the relations between pixels, I developed a novel graph-structured network, namely Spatial Propagation Networks (SPNs). It is a differentiable building block that can be flexibly embedded into any type of neural network and mimics the linear diffusion process. The module has benefited a series of tasks including color, depth, and semantic map propagation, in a data-driven manner in spatial, temporal and 3D domains, and for both regularly and arbitrarily structured data. To model the human-object interactions for affordance and functional reasoning, my colleagues and I proposed parallel-path generative models that can produce both semantically plausible and geometrically correct interactions from a single image. This work potentially enables automatic layout design and character placement, as applications for content creation.