TY - JOUR
T1 - Arpeggio
T2 - Harmonic compression of ChIP-seq data reveals protein-chromatin interaction signatures
AU - Stanton, Kelly Patrick
AU - Parisi, Fabio
AU - Strino, Francesco
AU - Rabin, Neta
AU - Asp, Patrik
AU - Kluger, Yuval
N1 - Funding Information: National Institute of Health [T15 LM07056 to K.S.] and [CA-158167 to Y.K.]; Yale Cancer Center translational research pilot funds (to F.P. and F.S.); American Cancer Society Award [M130572 to F.S.]; the American-Italian Cancer Foundation [Post-Doctoral Research Fellowship to F.S.]; the Peter T. Rowley Breast Cancer Research Projects funded by the New York State Department of Health [FAU 0812160900 Y.K.]. Funding for open access charge: The Peter T. Rowley Breast Cancer Research Projects funded by the New York State Department of Health [FAU 0812160900 Y.K.].
PY - 2013/9
Y1 - 2013/9
N2 - Researchers generating new genome-wide data in an exploratory sequencing study can gain biological insights by comparing their data with well-annotated data sets possessing similar genomic patterns. Data compression techniques are needed for efficient comparisons of a new genomic experiment with large repositories of publicly available profiles. Furthermore, data representations that allow comparisons of genomic signals from different platforms and across species enhance our ability to leverage these large repositories. Here, we present a signal processing approach that characterizes protein-chromatin interaction patterns at length scales of several kilobases. This allows us to efficiently compare numerous chromatin-immunoprecipitation sequencing (ChIP-seq) data sets consisting of many types of DNA-binding proteins collected from a variety of cells, conditions and organisms. Importantly, these interaction patterns broadly reflect the biological properties of the binding events. To generate these profiles, termed Arpeggio profiles, we applied harmonic deconvolution techniques to the autocorrelation profiles of the ChIP-seq signals. We used 806 publicly available ChIP-seq experiments and showed that Arpeggio profiles with similar spectral densities shared biological properties. Arpeggio profiles of ChIP-seq data sets revealed characteristics that are not easily detected by standard peak finders. They also allowed us to relate sequencing data sets from different genomes, experimental platforms and protocols. Arpeggio is freely available at http://sourceforge.net/p/arpeggio/ wiki/Home/.
AB - Researchers generating new genome-wide data in an exploratory sequencing study can gain biological insights by comparing their data with well-annotated data sets possessing similar genomic patterns. Data compression techniques are needed for efficient comparisons of a new genomic experiment with large repositories of publicly available profiles. Furthermore, data representations that allow comparisons of genomic signals from different platforms and across species enhance our ability to leverage these large repositories. Here, we present a signal processing approach that characterizes protein-chromatin interaction patterns at length scales of several kilobases. This allows us to efficiently compare numerous chromatin-immunoprecipitation sequencing (ChIP-seq) data sets consisting of many types of DNA-binding proteins collected from a variety of cells, conditions and organisms. Importantly, these interaction patterns broadly reflect the biological properties of the binding events. To generate these profiles, termed Arpeggio profiles, we applied harmonic deconvolution techniques to the autocorrelation profiles of the ChIP-seq signals. We used 806 publicly available ChIP-seq experiments and showed that Arpeggio profiles with similar spectral densities shared biological properties. Arpeggio profiles of ChIP-seq data sets revealed characteristics that are not easily detected by standard peak finders. They also allowed us to relate sequencing data sets from different genomes, experimental platforms and protocols. Arpeggio is freely available at http://sourceforge.net/p/arpeggio/ wiki/Home/.
UR - http://www.scopus.com/inward/record.url?scp=84886821445&partnerID=8YFLogxK
U2 - https://doi.org/10.1093/nar/gkt627
DO - https://doi.org/10.1093/nar/gkt627
M3 - مقالة
SN - 0305-1048
VL - 41
SP - e161
JO - Nucleic acids research
JF - Nucleic acids research
IS - 16
ER -