My research is being done under the guidance of Professor Tandy Warnow, and is focused on the estimation of phylogenetic relationships from sequence data (mostly DNA or RNA sequence data).

I have been awarded a fellowship with the CompGen initiative for the 2016-2017 academic year to work on new analytical methods for metagenomics data. This work is being co-advised by Professor Rebecca Stumpf, and we are applying many of these methods to non-human primate microbiome data.

Some of the specific projects I am working on include:

  • Experimenting with software developed by Prof. Warnow’s research team, including PASTA and SEPP (and it’s cousins TIPP and UPP) to find ways to improve performance, running time or both. PASTA is a method built on Sate [1] for co-estimating multiple sequence alignments and phylogeny trees through a divide-and-conquer heuristic, so it works on very large data sets. UPP is another alignment method for large data sets, but operates much differently and is robust to fragmentary sequence data.
  • Working on methods for large-scale metagenomics analysis in a high-performance computing environment.
  • Regression techniques for compositional data in microbial ecology. This is related to metagenomics but is more broadly applicable, and is generally for post-processing and biostatistics analysis on the output.
  • Some of my talented colleagues are working on the general problem of estimating species trees from multi-locus data under various model conditions, including in the presence of horizontal gene transfer (HGT). ASTRAL is a software developed in the group that does pretty well on this challenge and we are exploring its performance under various model conditions along with potential improvements. I’ve been working on a couple theorems related to statistical consistency of these methods under some particular model conditions.
  • I am also interested in the applications of this research to historical linguistics. In particular, I am hoping to estimate likelihoods of the five phylogenetic networks described in this paper about the Indo-European languages.
  • I have been working for a while on an interface to draw my own phylogenetic trees in python, which I call the Phylostrator. One could argue that we have far too many phylogeny viewing programs available already, but not many of them give you full control of what gets drawn on every pixel of the image, and not many of them are scriptable in python. The Phylostrator is the unholy marriage of PyCairo and DendroPy, with a GUI built in WxPython that does a few specific things well. It also has a module to draw alignments with a phylogeny by the side. So if you’re a biologist and a control freak with an artistic side, give it a shot!

Finally, I have a pet research interest that is not strictly related to my work: Algae. Specifically I am interested in the genotypes and phenotypes of strains of Algae that have the potential to be efficient biofuel producers.


[1] Liu, K., T.J. Warnow, M.T. Holder, S. Nelesen, J. Yu, A. Stamatakis, and C.R. Linder. “SATe-II: Very Fast and Accurate Simultaneous Estimation of Multiple Sequence Alignments and Phylogenetic Trees.” Systematic Biology (2012) 61(1):90-106