University of Illinois at Urbana-Champaign
Ismini Lourentzou is a CS PhD candidate at the University of Illinois at Urbana – Champaign, where she is advised by Prof. ChengXiang Zhai. She received a Bachelors in Computer Science from the Athens University of Economics and Business and a Bachelors in Business Administration from the Technological Educational Institute of Athens. Prior to UIUC, she has also worked in the banking sector for nearly a decade, holding a variety of positions at National Bank of Greece and as a technical reviewer for technology-related publishing companies.
Her research interests lie in the intersection of Machine Learning, Natural Language Processing and Information Retrieval, with a focus on Active and Semi-supervised learning, Deep Learning and Information Extraction. Her work targets interdisciplinary domains that touch societal dimensions, such as metabolic engineering, health informatics, education and sociology. She was a research intern at Microsoft Research and IBM Research and will join IBM Research this fall. She won a Microsoft Azure Research Award, an Outstanding Teaching Assistant Award and has been a departmental nominee for both IBM and MSR Fellowships.
Deep Learning has been applied with tremendous success in many domains, with models trained on millions of annotated instances. Yet only a small portion of data available is labeled, clean and structured. Data quality proves to be a critical factor for the success of machine learning. Moreover, domain knowledge and human intervention are key components of strong predictive performance, however encoding such information in a learned model is often non-trivial and costly. My research focuses on bridging the gap between machine learning and human knowledge in real-world scenarios, with the development of methods that minimize the annotation effort and can quickly extract information or understand natural language in domain-specific tasks.
I am particularly interested in designing robust Active and Semi-supervised learning algorithms. Towards this goal, I have proposed an iterative elimination algorithm that learns a combination of active learning acquisition functions on the fly. Additionally, my work addresses the challenge of producing good quality proxy labels by calibrating the confidence of a semi-supervised algorithm with an auxiliary active learning model. This calibrating technique improves robustness, achieves high accuracy and reduces the noise of proxy-labels.
My work also targets bottlenecks related to informal writing. As most Information Extraction tools struggle with the noisy informal nature of social media due to high out-of-vocabulary (OOV) word rates, I have designed a hybrid sequence-to-sequence text normalization model that can serve as a pre-processing step for any NLP application to adapt to social media data. Finally, my work also improves word representations for rare and emerging terms.