The Little Known Universe of Short Proteins in Insects: A Machine Learning Approach

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

Abstract

Modern genomics and proteomics technologies are turning out immense quantities of sequenced proteins. The only feasible way to assign functions to this flood of sequences is by applying state-of-the-art computational methods for automated functional annotation. We illustrate the significance of machine learning tools in identifying and annotating short bioactive proteins and peptides from insect genomes. Over 500,000 full-length proteins from insects are currently archived in databases, of which ~15 % are short proteins. Currently, most short sequences remain uncharacterized. We developed a platform to systematically identify the functional class of short toxin-like peptides in metazoa. We present data from eight representative genomes (140,000 proteins) that cover the main phylogenetic branches of Hexapoda. The platform is a trained machine-predictor that successfully identified ~800 toxin-like candidates, 250 of them predicted with high confidence. These proteins’ functions include ion channel inhibition, protease inhibitors, antimicrobial peptides, and components of the innate immune system. Our systematic approach can be expanded to new genomes and other biological classes of proteins. Using similar methodologies, we illustrate the success of identifying overlooked neuropeptide precursors. The systematic discovery of insect neuropeptides and short toxin-like proteins allows developing new strategies for pest control and manipulating insects’ behavior. The overlooked secreted short peptides are discussed with respect to their evolution and potential applications in biotechnology.
Original languageAmerican English
Title of host publicationSHORT VIEWS ON INSECT GENOMICS AND PROTEOMICS, VOL 1: INSECT GENOMICS
Subtitle of host publicationInsect Genomics
EditorsChandrasekar Raman, Marian R. Goldsmith, Tolulope A. Agunbiade
Place of PublicationCham
Pages177-202
Number of pages26
Volume1
ISBN (Electronic)978-3-319-24235-4
DOIs
StatePublished - 2015

Publication series

NameEntomology in Focus
PublisherSpringer
Volume3

Fingerprint

Dive into the research topics of 'The Little Known Universe of Short Proteins in Insects: A Machine Learning Approach'. Together they form a unique fingerprint.

Cite this