Identifying Regulatory Elements via Deep Learning

Mira Barshai, Eitamar Tripto, Yaron Orenstein

Research output: Contribution to journalReview articlepeer-review

Abstract

Deep neural networks have been revolutionizing the field of machine learning for the past several years. They have been applied with great success in many domains of the biomedical data sciences and are outperforming extant methods by a large margin. The ability of deep neural networks to pick up local image features and model the interactions between them makes them highly applicable to regulatory genomics. Instead of an image, the networks analyze DNA and RNA sequences and additional epigenomic data. In this review, we survey the successes of deep learning in the field of regulatory genomics. We first describe the fundamental building blocks of deep neural networks, popular architectures used in regulatory genomics, and their training process on molecular sequence data. We then review several key methods in different gene regulation domains. We start with the pioneering method DeepBind and its successors, which were developed to predict protein-DNA binding. We then review methods developed to predict and model epigenetic information, such as histone marks and nucleosome occupancy. Following epigenomics, we review methods to predict protein-RNA binding with its unique challenge of incorporating RNA structure information. Finally, we provide our overall view of the strengths and weaknesses of deep neural networks and prospects for future developments.

Original languageAmerican English
Pages (from-to)315-338
Number of pages24
JournalAnnual Review of Biomedical Data Science
Volume3
DOIs
StatePublished - 20 Jul 2020

Keywords

  • deep learning
  • gene regulation
  • motif finding
  • regulatory genomics

All Science Journal Classification (ASJC) codes

  • Biomedical Engineering
  • Biochemistry, Genetics and Molecular Biology (miscellaneous)
  • Genetics
  • Cancer Research

Cite this