What is C3?

C3 is a new correlation clustering method for cancer mutation analysis, and for inference of cancer driver genes.
To expand our capability to analyze combinatorial patterns of cancer alterations, we developed a rigorous methodology for cancer mutation pattern discovery based on a new, constrained form of correlation clustering. Our new algorithm, named C3 (Cancer Correlation Clustering), leverages mutual exclusivity of mutations, patient coverage, and driver network concentration principles. To test C3, we performed a detailed analysis on TCGA breast cancer and glioblastoma data and showed that our algorithm outperforms the state-of-the-art CoMEt method in terms of discovering mutually exclusive gene modules and identifying biologically relevant driver genes. The proposed agnostic clustering method represents a unique tool for efficient and reliable identification of mutation patterns and driver pathways in large-scale cancer genomics studies, and it may also be used for other clustering problems on biological graphs.

Why use C3?

Cancer Correlation Clustering (C3) directly tackles the problems of integrating diverse sources of evidence regarding driver pattern behavior and eliminates computational bottlenecks associated with large cluster sizes or cluster numbers. The C3 method uses a new agnostic optimization framework specifically developed and rigorously analyzed for the driver discovery task, in which patient data is converted into a simple set of weights used in the objective function that do not require the algorithm to change upon incorporation of new data sources. In addition to this flexibility, C3 has low computational cost, and it allows for adding relevant problem constraints while retaining good theoretical performance guarantees. Furthermore, the algorithm outperforms CoMEt in three out of four evaluation criteria, where the three criteria depend on which weights are “emphasized” in the optimization problem: tuning the weights allows one to select which features to improve or emphasize. What the relevant constraints features are may be chosen by the user (our analysis included coverage, mutual exclusivity, expression data and network pathway information). The weights may be chosen so as to cater to the need of many other computational biology problems that involve optimization on graphs.

Cancer Driver Gene Cloud - Image retrieved from IntOGen


Cancer Driver Gene Cloud - Image retrieved from IntOGen

How do I download C3?

You may access C3 from github here.


“A new correlation clustering method for cancer mutation analysis”.
J. Hou, A. Emad, G. Puleo, J. Ma, and O. Milenkovic.
Bioinformatics31(7)(2016). [link]


Jian Ma (jianma@cs.cmu.edu) and Olgica Milenkovic (milenkov@illinois.edu)

In Archive