In defense of word embedding for generic text representation

Guy Lev, Benjamin Klein, Lior Wolf

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Statistical methods have shown a remarkable ability to capture semantics. The word2vec method is a frequently cited method for capturing meaningful semantic relations between words from a large text corpus. It has the advantage of not requiring any tagging while training. The prevailing view is, however, that it lacks the ability to capture semantics of word sequences and is virtually useless for most purposes, unless combined with heavy machinery. This paper challenges that view, by showing that by augmenting the word2vec representation with one of a few pooling techniques, results are obtained surpassing or comparable with the best literature algorithms. This improved performance is justified by theory and verified by extensive experiments on well studied NLP benchmarks (This work is inspired by [10]).

Original languageEnglish
Title of host publicationNatural Language Processing and Information Systems - 20th International Conference on Applications of Natural Language to Information Systems, NLDB 2015, Proceedings
EditorsSiegfried Handschuh, André Freitas, Elisabeth Métais, Chris Biemann, Farid Meziane
Pages35-50
Number of pages16
DOIs
StatePublished - 2015
Event20th International Conference on Applications of Natural Language to Information Systems, NLDB 2015 - Passau, Germany
Duration: 17 Jun 201519 Jun 2015

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9103

Conference

Conference20th International Conference on Applications of Natural Language to Information Systems, NLDB 2015
Country/TerritoryGermany
CityPassau
Period17/06/1519/06/15

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'In defense of word embedding for generic text representation'. Together they form a unique fingerprint.

Cite this