Synthetic DNA barcodes identify singlets in scRNA-seq datasets and evaluate doublet algorithms

Ziyang Zhang, Madeline E. Melzer, Keerthana M. Arun, Hanxiao Sun, Carl Johan Eriksson, Itai Fabian, Sagi Shaashua, Karun Kiani, Yaara Oren, Yogesh Goyal

Research output: Contribution to journalArticlepeer-review

Abstract

Single-cell RNA sequencing (scRNA-seq) datasets contain true single cells, or singlets, in addition to cells that coalesce during the protocol, or doublets. Identifying singlets with high fidelity in scRNA-seq is necessary to avoid false negative and false positive discoveries. Although several methodologies have been proposed, they are typically tested on highly heterogeneous datasets and lack a priori knowledge of true singlets. Here, we leveraged datasets with synthetically introduced DNA barcodes for a hitherto unexplored application: to extract ground-truth singlets. We demonstrated the feasibility of our framework, “singletCode,” to evaluate existing doublet detection methods across a range of contexts. We also leveraged our ground-truth singlets to train a proof-of-concept machine learning classifier, which outperformed other doublet detection algorithms. Our integrative framework can identify ground-truth singlets and enable robust doublet detection in non-barcoded datasets.

Original languageEnglish
Article number100592
JournalCell Genomics
Volume4
Issue number7
DOIs
StatePublished - 10 Jul 2024

Keywords

  • barcoding
  • benchmarking
  • doublet detection
  • lineage tracing
  • machine learning
  • scRNA-seq
  • single-cell genomics
  • singletCode
  • singlets

All Science Journal Classification (ASJC) codes

  • Biochemistry, Genetics and Molecular Biology (miscellaneous)
  • Genetics

Cite this