Evaluation Guidelines to Deal with Implicit Phenomena to Assess Factuality in Data-to-Text Generation

Roy Eisenstadt, Michael Elhadad

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Data-to-text generation systems are trained on large datasets, such as WebNLG, RotoWire, E2E or DART. Beyond traditional token-overlap evaluation metrics (BLEU or METEOR), a key concern faced by recent generators is to control the factuality of the generated text with respect to the input data specification. We report on our experience when developing an automatic factuality evaluation system for data-to-text generation that we are testing on WebNLG and E2E data. We aim to prepare gold data annotated manually to identify cases where the text communicates more information than is warranted based on the input data (extra) or fails to communicate data that is part of the input (missing). While analyzing reference (data, text) samples, we encountered a range of systematic uncertainties that are related to cases on implicit phenomena in text, and the nature of non-linguistic knowledge we expect to be involved when assessing factuality. We derive from our experience a set of evaluation guidelines to reach high inter-annotator agreement on such cases.

Original languageAmerican English
Title of host publicationUNIMPLICIT 2021 - 1st Workshop on Understanding Implicit and Underspecified Language, Proceedings of the Workshop
EditorsMichael Roth, Reut Tsarfaty, Yoav Goldberg
PublisherAssociation for Computational Linguistics (ACL)
Pages20-27
Number of pages8
ISBN (Electronic)9781954085763
StatePublished - 1 Jan 2021
Event1st Workshop on Understanding Implicit and Underspecified Language, UNIMPLICIT 2021 - Virtual, Bangkok, Thailand
Duration: 5 Aug 20216 Aug 2021

Publication series

NameUNIMPLICIT 2021 - 1st Workshop on Understanding Implicit and Underspecified Language, Proceedings of the Workshop

Conference

Conference1st Workshop on Understanding Implicit and Underspecified Language, UNIMPLICIT 2021
Country/TerritoryThailand
CityVirtual, Bangkok
Period5/08/216/08/21

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Computational Theory and Mathematics
  • Software
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Evaluation Guidelines to Deal with Implicit Phenomena to Assess Factuality in Data-to-Text Generation'. Together they form a unique fingerprint.

Cite this