TY - JOUR
T1 - NeuroPID
T2 - A classifier of neuropeptide precursors
AU - Karsenty, Solange
AU - Rappoport, Nadav
AU - Ofer, Dan
AU - Zair, Adva
AU - Linial, Michal
PY - 2014/7/1
Y1 - 2014/7/1
N2 - Neuropeptides (NPs) are short secreted peptides produced in neurons. NPs act by activating signaling cascades governing broad functions such as metabolism, sensation and behavior throughout the animal kingdom. NPs are the products of multistep processing of longer proteins, the NP precursors (NPPs). We present NeuroPID (Neuropeptide Precursor Identifier), an online machine-learning tool that identifies metazoan NPPs. NeuroPID was trained on 1418 NPPs annotated as such by UniProtKB. A large number of sequence-based features were extracted for each sequence with the goal of capturing the biophysical and informational-statistical properties that distinguish NPPś from other proteins. Training several machine-learning models, including support vector machines and ensemble decision trees, led to high accuracy (89-94%) and precision (90-93%) in cross-validation tests. For inputs of thousands of unseen sequences, the tool provides a ranked list of high quality predictions based on the results of four machine-learning classifiers. The output reveals many uncharacterized NPPs and secreted cell modulators that are rich in potential cleavage sites. NeuroPID is a discovery and a prediction tool that can be used to identify NPPs from unannotated transcriptomes and mass spectrometry experiments. NeuroPID predicted sequences are attractive targets for investigating behavior, physiology and cell modulation. The NeuroPID web tool is available at http:// neuropid.cs.huji.ac.il.
AB - Neuropeptides (NPs) are short secreted peptides produced in neurons. NPs act by activating signaling cascades governing broad functions such as metabolism, sensation and behavior throughout the animal kingdom. NPs are the products of multistep processing of longer proteins, the NP precursors (NPPs). We present NeuroPID (Neuropeptide Precursor Identifier), an online machine-learning tool that identifies metazoan NPPs. NeuroPID was trained on 1418 NPPs annotated as such by UniProtKB. A large number of sequence-based features were extracted for each sequence with the goal of capturing the biophysical and informational-statistical properties that distinguish NPPś from other proteins. Training several machine-learning models, including support vector machines and ensemble decision trees, led to high accuracy (89-94%) and precision (90-93%) in cross-validation tests. For inputs of thousands of unseen sequences, the tool provides a ranked list of high quality predictions based on the results of four machine-learning classifiers. The output reveals many uncharacterized NPPs and secreted cell modulators that are rich in potential cleavage sites. NeuroPID is a discovery and a prediction tool that can be used to identify NPPs from unannotated transcriptomes and mass spectrometry experiments. NeuroPID predicted sequences are attractive targets for investigating behavior, physiology and cell modulation. The NeuroPID web tool is available at http:// neuropid.cs.huji.ac.il.
UR - http://www.scopus.com/inward/record.url?scp=84904788159&partnerID=8YFLogxK
U2 - https://doi.org/10.1093/nar/gku363
DO - https://doi.org/10.1093/nar/gku363
M3 - Article
C2 - 24792159
SN - 0305-1048
VL - 42
SP - W182-W186
JO - Nucleic acids research
JF - Nucleic acids research
IS - W1
ER -