Drop: A reading comprehension benchmark requiring discrete reasoning over paragraphs

Dheeru Dua, Yizhong Wang, Pradeep Dasigi, Gabriel Stanovsky, Sameer Singh, Matt Gardner

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Reading comprehension has recently seen rapid progress, with systems matching humans on the most popular datasets for the task. However, a large body of work has highlighted the brittleness of these systems, showing that there is much work left to be done. We introduce a new English reading comprehension benchmark, DROP, which requires Discrete Reasoning Over the content of Paragraphs. In this crowdsourced, adversarially-created, 96k-question benchmark, a system must resolve references in a question, perhaps to multiple input positions, and perform discrete operations over them (such as addition, counting, or sorting). These operations require a much more comprehensive understanding of the content of paragraphs than what was necessary for prior datasets. We apply state-of-the-art methods from both the reading comprehension and semantic parsing literatures on this dataset and show that the best systems only achieve 32.7% F1 on our generalized accuracy metric, while expert human performance is 96.4%. We additionally present a new model that combines reading comprehension methods with simple numerical reasoning to achieve 47.0% F.

Original languageEnglish
Title of host publicationLong and Short Papers
PublisherAssociation for Computational Linguistics (ACL)
Pages2368-2378
Number of pages11
ISBN (Electronic)9781950737130
StatePublished - 2019
Externally publishedYes
Event2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2019 - Minneapolis, United States
Duration: 2 Jun 20197 Jun 2019

Publication series

NameNAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference
Volume1

Conference

Conference2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2019
Country/TerritoryUnited States
CityMinneapolis
Period2/06/197/06/19

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Computer Science Applications
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Drop: A reading comprehension benchmark requiring discrete reasoning over paragraphs'. Together they form a unique fingerprint.

Cite this