TY - GEN
T1 - SWAG
T2 - 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018
AU - Zellers, Rowan
AU - Bisk, Yonatan
AU - Schwartz, Roy
AU - Choi, Yejin
N1 - Publisher Copyright: © 2018 Association for Computational Linguistics
PY - 2018
Y1 - 2018
N2 - Given a partial description like “she opened the hood of the car,” humans can reason about the situation and anticipate what might come next (“then, she examined the engine”). In this paper, we introduce the task of grounded commonsense inference, unifying natural language inference and commonsense reasoning. We present Swag, a new dataset with 113k multiple choice questions about a rich spectrum of grounded situations. To address the recurring challenges of the annotation artifacts and human biases found in many existing datasets, we propose Adversarial Filtering (AF), a novel procedure that constructs a de-biased dataset by iteratively training an ensemble of stylistic classifiers, and using them to filter the data. To account for the aggressive adversarial filtering, we use state-of-the-art language models to massively oversample a diverse set of potential counterfactuals. Empirical results demonstrate that while humans can solve the resulting inference problems with high accuracy (88%), various competitive models struggle on our task. We provide comprehensive analysis that indicates significant opportunities for future research.
AB - Given a partial description like “she opened the hood of the car,” humans can reason about the situation and anticipate what might come next (“then, she examined the engine”). In this paper, we introduce the task of grounded commonsense inference, unifying natural language inference and commonsense reasoning. We present Swag, a new dataset with 113k multiple choice questions about a rich spectrum of grounded situations. To address the recurring challenges of the annotation artifacts and human biases found in many existing datasets, we propose Adversarial Filtering (AF), a novel procedure that constructs a de-biased dataset by iteratively training an ensemble of stylistic classifiers, and using them to filter the data. To account for the aggressive adversarial filtering, we use state-of-the-art language models to massively oversample a diverse set of potential counterfactuals. Empirical results demonstrate that while humans can solve the resulting inference problems with high accuracy (88%), various competitive models struggle on our task. We provide comprehensive analysis that indicates significant opportunities for future research.
UR - http://www.scopus.com/inward/record.url?scp=85081747779&partnerID=8YFLogxK
U2 - 10.18653/v1/d18-1009
DO - 10.18653/v1/d18-1009
M3 - منشور من مؤتمر
T3 - Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018
SP - 93
EP - 104
BT - Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018
A2 - Riloff, Ellen
A2 - Chiang, David
A2 - Hockenmaier, Julia
A2 - Tsujii, Jun'ichi
Y2 - 31 October 2018 through 4 November 2018
ER -