TY - JOUR
T1 - Do two machine-learning based prognostic signatures for breast cancer capture the same biological processes?
AU - Drier, Yotam
AU - Domany, Eytan
N1 - Leir Charitable Foundation; Weizmann-Mario Negri collaborative research grant; German Research Foundation (DIP)This research was supported by the Leir Charitable Foundation, a Weizmann-Mario Negri collaborative research grant and by a grant from the German Research Foundation (DIP). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
PY - 2011/3
Y1 - 2011/3
N2 - The fact that there is very little if any overlap between the genes of different prognostic signatures for early-discovery breast cancer is well documented. The reasons for this apparent discrepancy have been explained by the limits of simple machine-learning identification and ranking techniques, and the biological relevance and meaning of the prognostic gene lists was questioned. Subsequently, proponents of the prognostic gene lists claimed that different lists do capture similar underlying biological processes and pathways. The present study places under scrutiny the validity of this claim, for two important gene lists that are at the focus of current large-scale validation efforts. We performed careful enrichment analysis, controlling the effects of multiple testing in a manner which takes into account the nested dependent structure of gene ontologies. In contradiction to several previous publications, we find that the only biological process or pathway for which statistically significant concordance can be claimed is cell proliferation, a process whose relevance and prognostic value was well known long before gene expression profiling. We found that the claims reported by others, of wider concordance between the biological processes captured by the two prognostic signatures studied, were found either to be lacking statistical rigor or were in fact based on addressing some other question.
AB - The fact that there is very little if any overlap between the genes of different prognostic signatures for early-discovery breast cancer is well documented. The reasons for this apparent discrepancy have been explained by the limits of simple machine-learning identification and ranking techniques, and the biological relevance and meaning of the prognostic gene lists was questioned. Subsequently, proponents of the prognostic gene lists claimed that different lists do capture similar underlying biological processes and pathways. The present study places under scrutiny the validity of this claim, for two important gene lists that are at the focus of current large-scale validation efforts. We performed careful enrichment analysis, controlling the effects of multiple testing in a manner which takes into account the nested dependent structure of gene ontologies. In contradiction to several previous publications, we find that the only biological process or pathway for which statistically significant concordance can be claimed is cell proliferation, a process whose relevance and prognostic value was well known long before gene expression profiling. We found that the claims reported by others, of wider concordance between the biological processes captured by the two prognostic signatures studied, were found either to be lacking statistical rigor or were in fact based on addressing some other question.
UR - http://www.scopus.com/inward/record.url?scp=79952687473&partnerID=8YFLogxK
U2 - https://doi.org/10.1371/journal.pone.0017795
DO - https://doi.org/10.1371/journal.pone.0017795
M3 - مقالة
C2 - 21423753
SN - 1932-6203
VL - 6
JO - PLoS ONE
JF - PLoS ONE
IS - 3
M1 - e17795
ER -