TY - GEN
T1 - Tensor composition analysis detects cell-type specific associations in epigenetic studies
AU - Rahmani, Elior
AU - Schweiger, Regev
AU - Rosset, Saharon
AU - Sankararaman, Sriram
AU - Halperin, Eran
N1 - Publisher Copyright: © Springer International Publishing AG, part of Springer Nature 2018.
PY - 2018
Y1 - 2018
N2 - Identifying cell-type specific associations of genes with disease and mapping known associations to particular cell types is a key in understanding disease etiology. While developments in technologies for profiling genomic features such as gene expression and DNA methylation have led to the availability of large-scale tissue-specific genomic data, prohibitive costs drastically restrict collection of cell-type specific genomic data. This, in turn, limits the identification of disease-related genes and cell types. It is therefore desired to develop new approaches for detecting cell-type specific associations between phenotypes and tissue-specific genomic data. We suggest a new matrix factorization formulation, which allows us to deconvolve a two-dimensional input (observations by features) into a three-dimensional output. Traditional matrix factorization formulations essentially take as an input a multiple-source heterogeneous matrix of observations and output a matrix of source-specific weights and a matrix of source-specific features. We generalize this approach by assuming that source-specific features are unique for each observation rather than shared across all observations, and we propose Tensor Composition Analysis (TCA), a method for estimating observation- and source-specific values based on the model. We apply our model in the context of epigenetic association studies, where DNA methylation data measured from a heterogeneous tissue are often used, and we show that TCA allows us to extract cell-type specific methylation levels from two dimensional tissue-specific methylation data. We further derive a statistical test for detecting cell-type specific effects of methylation on phenotypes based on the TCA model, and using a simulation study we demonstrate its potentials and limitations. Finally, using five large whole-blood methylation datasets, we demonstrate that our model allows the detection of novel replicating cell-type specific associations without collecting cost prohibitive cell-type specific data, thus suggesting an exciting new opportunity to unveil more of the hidden signals in genomic association studies with potential design implications for future data collection efforts.
AB - Identifying cell-type specific associations of genes with disease and mapping known associations to particular cell types is a key in understanding disease etiology. While developments in technologies for profiling genomic features such as gene expression and DNA methylation have led to the availability of large-scale tissue-specific genomic data, prohibitive costs drastically restrict collection of cell-type specific genomic data. This, in turn, limits the identification of disease-related genes and cell types. It is therefore desired to develop new approaches for detecting cell-type specific associations between phenotypes and tissue-specific genomic data. We suggest a new matrix factorization formulation, which allows us to deconvolve a two-dimensional input (observations by features) into a three-dimensional output. Traditional matrix factorization formulations essentially take as an input a multiple-source heterogeneous matrix of observations and output a matrix of source-specific weights and a matrix of source-specific features. We generalize this approach by assuming that source-specific features are unique for each observation rather than shared across all observations, and we propose Tensor Composition Analysis (TCA), a method for estimating observation- and source-specific values based on the model. We apply our model in the context of epigenetic association studies, where DNA methylation data measured from a heterogeneous tissue are often used, and we show that TCA allows us to extract cell-type specific methylation levels from two dimensional tissue-specific methylation data. We further derive a statistical test for detecting cell-type specific effects of methylation on phenotypes based on the TCA model, and using a simulation study we demonstrate its potentials and limitations. Finally, using five large whole-blood methylation datasets, we demonstrate that our model allows the detection of novel replicating cell-type specific associations without collecting cost prohibitive cell-type specific data, thus suggesting an exciting new opportunity to unveil more of the hidden signals in genomic association studies with potential design implications for future data collection efforts.
UR - http://www.scopus.com/inward/record.url?scp=85046140251&partnerID=8YFLogxK
M3 - منشور من مؤتمر
SN - 9783319899282
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 274
EP - 275
BT - Research in Computational Molecular Biology - 22nd Annual International Conference, RECOMB 2018, Proceedings
A2 - Raphael, Benjamin J.
T2 - 22nd International Conference on Research in Computational Molecular Biology, RECOMB 2018
Y2 - 21 April 2018 through 24 April 2018
ER -