Towards Joint Understanding of Images and Language

Speaker: Svetlana Lazebnik, University of Illinois Urbana-Champaign<

Abstract: From robotics to human-computer interaction, numerous real-world tasks can benefit from practical systems that can identify objects in scenes based on language and understand language grounded in visual context. This presentation will focus on my group’s work on developing systems for jointly modeling images and language. I will talk about methods for learning cross-modal embeddings for text-to-image and image-to-text search, and about the challenging task of grounding or localizing of textual mentions of entities in an image. I will also discuss a large-scale data collection effort carried out in collaboration with Julia Hockenmaier in support of these projects, as well as cutting-edge applications such as automatic image description and visual question answering.