Coming to your senses: On controls and evaluation sets in polysemy research

Haim Dubossarsky, Eitan Grossman, Daphna Weinshall

Research output: Contribution to conferencePaperpeer-review

Abstract

The point of departure of this article is the claim that sense-specific vectors provide an advantage over normal vectors due to the polysemy that they presumably represent. This claim is based on performance gains observed in gold standard evaluation tests such as word similarity tasks. We demonstrate that this claim, at least as it is instantiated in prior art, is unfounded in two ways. Furthermore, we provide empirical data and an analytic discussion that may account for the previously reported improved performance. First, we show that ground-truth polysemy degrades performance in word similarity tasks. Therefore word similarity tasks are not suitable as an evaluation test for polysemy representation. Second, random assignment of words to senses is shown to improve performance in the same task. This and additional results point to the conclusion that performance gains as reported in previous work may be an artifact of random sense assignment, which is equivalent to sub-sampling and multiple estimation of word vector representations. Theoretical analysis shows that this may on its own be beneficial for the estimation of word similarity, by reducing the bias in the estimation of the cosine distance.

Original languageEnglish
Pages1732-1740
Number of pages9
StatePublished - 2018
Event2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018 - Brussels, Belgium
Duration: 31 Oct 20184 Nov 2018

Conference

Conference2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018
Country/TerritoryBelgium
CityBrussels
Period31/10/184/11/18

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems

Fingerprint

Dive into the research topics of 'Coming to your senses: On controls and evaluation sets in polysemy research'. Together they form a unique fingerprint.

Cite this