Research Projects

Complexes Detection in Biological Networks via Diversified Dense Subgraphs Mining

Undergraduate Research Assistant, supervised by Prof Jiawei Han and Prof Jian Peng (UIUC, CS) & Prof Xiuli Ma (Peking University)

overlap

Protein-protein-interaction (PPI) networks, providing a comprehensive landscape of protein interacting patterns, enable us to explore biological processes and cellular components at multiple resolutions. For a biological process, a number of proteins need to work together to perform the job. Proteins densely interact with each other, forming large molecular machines or cellular building blocks.

Identification of such densely interconnected clusters or protein complexes from PPI networks enables us to obtain a better understanding of the hierarchy and organization of biological processes and cellular components.

In this project, we introduce a novel approximate algorithm to efficiently enumerate putative protein complexes from biological networks. The key insight of our algorithm is that we only need to find a small subset of subgraphs that cover as many proteins as possible, meanwhile have minimal overlap among themselves. The problem is formulated as finding a diverse set of dense subgraphs, where we develop highly effective pruning techniques to guarantee efficiency. To handle large networks, we take a divide-and-conquer approach to speed up the algorithm in a distributed manner.

By comparing with existing clustering and dense subgraph-based algorithms on several human and yeast PPI networks, we demonstrate that our method can detect more putative protein complexes and achieves better prediction accuracy. For more details: Please refer to the RECOMB Paper (RECOMB ’16)

Real-time Local Event Detection in Geo-tagged Tweet Stream

Undergraduate Research Assistant, supervised by Prof J, Han (UIUC, CS)

The real-time discovery of local events (e.g., protests, crimes, disasters, sport games) is of great importance to various applications, such as crime monitoring, disaster alarming, and activity recommendation. While this task seemed nearly impossible years ago due to the lack of timely and reliable data sources, the recent explosive growth in geo-tagged tweet data brings new opportunities to it. Nevertheless, how to extract quality local events from the geo-tagged tweet stream in real time is a challenging task that remains largely unsolved.

We a two-step method that achieves effective and real-time local event detection in the geo-tagged tweet stream. The first step of the method identifies several pivot tweets in the query window to form candidate events. The pivot tweets are identified based on: (1) a novel authority concept that captures the geo-topic correlations among tweets; and (2) an authority ascent process that finds authority maxima. In the second step, it ranks all the candidates by spatiotemporal burstiness. Specifically, it continuously summarizes the tweet stream, and compares each candidate against the summaries in a reference window to quantify its spatiotemporal burstiness. Finally, it features an updating module that finds new pivots with little time cost when the query win- dow shifts.

Screen Shot 2016-03-09 at 2.36.11 PMAs such, it is capable of monitoring the continuous stream in real time. We used crowdsourcing to evaluate it on a real-life data set that contains millions of geo-tagged tweets. The results show that the our method significantly outperforms state-of-the-art methods in precision, and is orders of magnitude faster.

Paper submitted to SIGIR (under review)

Going Beyond Positives and Negatives in sentiment analysis

Undergraduate Research Assistant, supervised by Prof K, Ganesen

  • Crawled user reviews data and conducted NLP analysis on semantic rich sentiment components: complaints and praises from user reviews
  • Trained various classifiers including SVM, Logistic Regression and Naïve Bayes by feature selection
  • Implemented the Chrome Extension to highlight rich semantic reviews on website

Text Entity Extraction and Match in semi-supervised learning with constraints

Senior Thesis, supervised by Prof Kevin, Chang (UIUC, CS department)

  • Studied natural language processing theories including Information Extraction, Text Classification Model, and Named Entity Recognizer, with a deep investigation on the state-of-art sequential discriminative probabilistic graphical model: Conditional Random Field.
  • Conducted experiments and analysis by utilizing Stanford Named Entity Recognizer (NER) library.
  • Participated in the designing the automatic constraint-learning 
framework.

City-scale transportation system resilience and outlier detection

Undergraduate Research Assistant, supervised by Prof Daniel, Work (UIUC, CEE department)

  • Studied SVD, PCA and SPCA with EM to efficiently do computation on high-dimensional traffic dataset.
  • Wrote Python code to process high volume and dimensional GPS data and generate analysis.
  • Understood the principal of outlier and abnormal event detection based on the diagrams and animations of city traffic dynamic system.