TY - GEN
T1 - In defense of word embedding for generic text representation
AU - Lev, Guy
AU - Klein, Benjamin
AU - Wolf, Lior
N1 - Publisher Copyright: © Springer International Publishing Switzerland 2015.
PY - 2015
Y1 - 2015
N2 - Statistical methods have shown a remarkable ability to capture semantics. The word2vec method is a frequently cited method for capturing meaningful semantic relations between words from a large text corpus. It has the advantage of not requiring any tagging while training. The prevailing view is, however, that it lacks the ability to capture semantics of word sequences and is virtually useless for most purposes, unless combined with heavy machinery. This paper challenges that view, by showing that by augmenting the word2vec representation with one of a few pooling techniques, results are obtained surpassing or comparable with the best literature algorithms. This improved performance is justified by theory and verified by extensive experiments on well studied NLP benchmarks (This work is inspired by [10]).
AB - Statistical methods have shown a remarkable ability to capture semantics. The word2vec method is a frequently cited method for capturing meaningful semantic relations between words from a large text corpus. It has the advantage of not requiring any tagging while training. The prevailing view is, however, that it lacks the ability to capture semantics of word sequences and is virtually useless for most purposes, unless combined with heavy machinery. This paper challenges that view, by showing that by augmenting the word2vec representation with one of a few pooling techniques, results are obtained surpassing or comparable with the best literature algorithms. This improved performance is justified by theory and verified by extensive experiments on well studied NLP benchmarks (This work is inspired by [10]).
UR - http://www.scopus.com/inward/record.url?scp=84948844165&partnerID=8YFLogxK
U2 - https://doi.org/10.1007/978-3-319-19581-0_3
DO - https://doi.org/10.1007/978-3-319-19581-0_3
M3 - منشور من مؤتمر
SN - 9783319195803
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 35
EP - 50
BT - Natural Language Processing and Information Systems - 20th International Conference on Applications of Natural Language to Information Systems, NLDB 2015, Proceedings
A2 - Handschuh, Siegfried
A2 - Freitas, André
A2 - Métais, Elisabeth
A2 - Biemann, Chris
A2 - Meziane, Farid
T2 - 20th International Conference on Applications of Natural Language to Information Systems, NLDB 2015
Y2 - 17 June 2015 through 19 June 2015
ER -