A machine learning approach to identify hydrogenosomal proteins in trichomonas vaginalis

David Burstein, Sven B. Gould, Verena Zimorski, Thorsten Kloesges, Fuat Kiosse, Peter Major, William F. Martin, Tal Pupko, Tal Dagan

Research output: Contribution to journalArticlepeer-review


The protozoan parasite Trichomonas vaginalis is the causative agent of trichomoniasis, the most widespread nonviral sexually transmitted disease in humans. It possesses hydrogenosomes-anaerobic mitochondria that generate H 2, CO 2, and acetate from pyruvate while converting ADP to ATP via substrate-level phosphorylation. T. vaginalis hydrogenosomes lack a genome and translation machinery; hence, they import all their proteins from the cytosol. To date, however, only 30 imported proteins have been shown to localize to the organelle. A total of 226 nuclear-encoded proteins inferred from the genome sequence harbor a characteristic short N-terminal presequence, reminiscent of mitochondrial targeting peptides, which is thought to mediate hydrogenosomal targeting. Recent studies suggest, however, that the presequences might be less important than previously thought. We sought to identify new hydrogenosomal proteins within the 59,672 annotated open reading frames (ORFs) of T. vaginalis, independent of the N-terminal targeting signal, using a machine learning approach. Our training set included 57 gene and protein features determined for all 30 known hydrogenosomal proteins and 576 nonhydrogenosomal proteins. Several classifiers were trained on this set to yield an import score for all proteins encoded by T. vaginalis ORFs, predicting the likelihood of hydrogenosomal localization. The machine learning results were tested through immunofluorescence assay and immunodetection in isolated cell fractions of 14 protein predictions using hemagglutinin constructs expressed under the homologous SCSa promoter in transiently transformed T. vaginalis cells. Localization of 6 of the 10 top predicted hydrogenosome-localized proteins was confirmed, and two of these were found to lack an obvious N-terminal targeting signal.

Original languageEnglish
Pages (from-to)217-228
Number of pages12
JournalEukaryotic Cell
Issue number2
StatePublished - Feb 2012

All Science Journal Classification (ASJC) codes

  • Molecular Biology
  • Microbiology


Dive into the research topics of 'A machine learning approach to identify hydrogenosomal proteins in trichomonas vaginalis'. Together they form a unique fingerprint.

Cite this