Ensembled Transferred Embeddings

Yonatan Hadar, Erez Shmueli

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

Abstract

Deep learning has become a very popular method for text classification in recent years, due to its ability to improve the accuracy of previous state-of-the-art methods on several benchmarks. However, these improvements required hundreds of thousands to millions labeled training examples, which in many cases can be very time consuming and/or expensive to acquire. This problem is especially significant in domain specific text classification tasks where pretrained embeddings and models are not optimal. In order to cope with this problem, we propose a novel learning framework, Ensembled Transferred Embeddings (ETE), which relies on two key ideas: (1) Labeling a relatively small sample of the target dataset, in a semi-automatic process (2) Leveraging other datasets from related domains or related tasks that are large-scale and labeled, to extract “transferable embeddings” Evaluation of ETE on a large-scale real-world item categorization dataset provided to us by PayPal, shows that it significantly outperforms traditional as well as state-of-the-art item categorization methods.

Original languageEnglish
Title of host publicationMachine Learning for Data Science Handbook
Subtitle of host publicationData Mining and Knowledge Discovery Handbook, Third Edition
Pages587-606
Number of pages20
ISBN (Electronic)9783031246289
DOIs
StatePublished - 1 Jan 2023

All Science Journal Classification (ASJC) codes

  • General Computer Science
  • General Mathematics

Fingerprint

Dive into the research topics of 'Ensembled Transferred Embeddings'. Together they form a unique fingerprint.

Cite this