A dataset of peer reviews (PeerRead): Collection, insights and NLP applications

Dongyeop Kang, Waleed Ammar, Bhavana Dalvi Mishra, Madeleine Van Zuylen, Sebastian Kohlmeier, Eduard Hovy, Roy Schwartz

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Peer reviewing is a central component in the scientific publishing process. We present the first public dataset of scientific peer reviews available for research purposes (PeerRead v1),1 providing an opportunity to study this important artifact. The dataset consists of 14.7K paper drafts and the corresponding accept/reject decisions in top-Tier venues including ACL, NIPS and ICLR. The dataset also includes 10.7K textual peer reviews written by experts for a subset of the papers. We describe the data collection process and report interesting observed phenomena in the peer reviews. We also propose two novel NLP tasks based on this dataset and provide simple baseline models. In the first task, we show that simple models can predict whether a paper is accepted with up to 21% error reduction compared to the majority baseline. In the second task, we predict the numerical scores of review aspects and show that simple models can outperform the mean baseline for aspects with high variance such as ?originality' and ?impact'.

Original languageEnglish
Title of host publicationLong Papers
PublisherAssociation for Computational Linguistics (ACL)
Pages1647-1661
Number of pages15
ISBN (Electronic)9781948087278
StatePublished - 2018
Externally publishedYes
Event2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2018 - New Orleans, United States
Duration: 1 Jun 20186 Jun 2018

Publication series

NameNAACL HLT 2018 - 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference
Volume1

Conference

Conference2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2018
Country/TerritoryUnited States
CityNew Orleans
Period1/06/186/06/18

All Science Journal Classification (ASJC) codes

  • Linguistics and Language
  • Language and Linguistics
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'A dataset of peer reviews (PeerRead): Collection, insights and NLP applications'. Together they form a unique fingerprint.

Cite this