Improving reliability of word similarity evaluation by redesigning annotation task and performance measure

Oded Avraham, Yoav Goldberg

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We suggest a new method for creating and using gold-standard datasets for word similarity evaluation. Our goal is to improve the reliability of the evaluation, and we do this by redesigning the annotation task to achieve higher inter-rater agreement, and by defining a performance measure which takes the reliability of each annotation decision in the dataset into account.

Original languageAmerican English
Title of host publicationProceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP, RepEval 2016 at the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016
PublisherAssociation for Computational Linguistics (ACL)
Pages106-110
Number of pages5
ISBN (Electronic)9781945626142
DOIs
StatePublished - 1 Jan 2016
Externally publishedYes
Event1st Workshop on Evaluating Vector-Space Representations for NLP, RepEval 2016 at the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Berlin, Germany
Duration: 7 Aug 2016 → …

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics

Conference

Conference1st Workshop on Evaluating Vector-Space Representations for NLP, RepEval 2016 at the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016
Country/TerritoryGermany
CityBerlin
Period7/08/16 → …

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Linguistics and Language
  • Language and Linguistics

Cite this