TY - JOUR
T1 - Utilizing somatic mutation data from numerous studies for cancer research
T2 - Proof of concept and applications
AU - Amar, D.
AU - Izraeli, S.
AU - Shamir, R.
N1 - Funding Information: This study was supported in part by the Israel Science Foundation (grant 317/13), an IDEA grant from the Dotan Center in Hemato-Oncology, and the Israeli Center of Research Excellence (I-CORE), Gene Regulation in Complex Human Disease, Center No 41/11. DA is grateful to the Azrieli Foundation for the award of an Azrieli Fellowship. DA was also supported in part by fellowships from the Edmond J. Safra Center for Bioinformatics at Tel Aviv University. Part of the work was done while DA and RS were visiting the Simons Institute for the Theory of Computing.
PY - 2017/6/15
Y1 - 2017/6/15
N2 - Large cancer projects measure somatic mutations in thousands of samples, gradually assembling a catalog of recurring mutations in cancer. Many methods analyze these data jointly with auxiliary information with the aim of identifying subtype-specific results. Here, we show that somatic gene mutations alone can reliably and specifically predict cancer subtypes. Interpretation of the classifiers provides useful insights for several biomedical applications. We analyze the COSMIC database, which collects somatic mutations from The Cancer Genome Atlas (TCGA) as well as from many smaller scale studies. We use multi-label classification techniques and the Disease Ontology hierarchy in order to identify cancer subtype-specific biomarkers. Cancer subtype classifiers based on TCGA and the smaller studies have comparable performance, and the smaller studies add a substantial value in terms of validation, coverage of additional subtypes, and improved classification. The gene sets of the classifiers are used for threefold contribution. First, we refine the associations of genes to cancer subtypes and identify novel compelling candidate driver genes. Second, using our classifiers we successfully predict the primary site of metastatic samples. Third, we provide novel hypotheses regarding detection of subtype-specific synthetic lethality interactions. From the cancer research community perspective, our results suggest that curation efforts, such as COSMIC, have great added and complementary value even in the era of large international cancer projects.
AB - Large cancer projects measure somatic mutations in thousands of samples, gradually assembling a catalog of recurring mutations in cancer. Many methods analyze these data jointly with auxiliary information with the aim of identifying subtype-specific results. Here, we show that somatic gene mutations alone can reliably and specifically predict cancer subtypes. Interpretation of the classifiers provides useful insights for several biomedical applications. We analyze the COSMIC database, which collects somatic mutations from The Cancer Genome Atlas (TCGA) as well as from many smaller scale studies. We use multi-label classification techniques and the Disease Ontology hierarchy in order to identify cancer subtype-specific biomarkers. Cancer subtype classifiers based on TCGA and the smaller studies have comparable performance, and the smaller studies add a substantial value in terms of validation, coverage of additional subtypes, and improved classification. The gene sets of the classifiers are used for threefold contribution. First, we refine the associations of genes to cancer subtypes and identify novel compelling candidate driver genes. Second, using our classifiers we successfully predict the primary site of metastatic samples. Third, we provide novel hypotheses regarding detection of subtype-specific synthetic lethality interactions. From the cancer research community perspective, our results suggest that curation efforts, such as COSMIC, have great added and complementary value even in the era of large international cancer projects.
UR - http://www.scopus.com/inward/record.url?scp=85009753147&partnerID=8YFLogxK
U2 - 10.1038/onc.2016.489
DO - 10.1038/onc.2016.489
M3 - مقالة
C2 - 28092680
SN - 0950-9232
VL - 36
SP - 3375
EP - 3383
JO - Oncogene
JF - Oncogene
IS - 24
ER -