TY - GEN
T1 - Improving compositional generalization in semantic parsing
AU - Oren, Inbar
AU - Herzig, Jonathan
AU - Gupta, Nitish
AU - Gardner, Matt
AU - Berant, Jonathan
N1 - Publisher Copyright: © 2020 Association for Computational Linguistics
PY - 2020
Y1 - 2020
N2 - Generalization of models to out-of-distribution (OOD) data has captured tremendous attention recently. Specifically, compositional generalization, i.e., whether a model generalizes to new structures built of components observed during training, has sparked substantial interest. In this work, we investigate compositional generalization in semantic parsing, a natural test-bed for compositional generalization, as output programs are constructed from sub-components. We analyze a wide variety of models and propose multiple extensions to the attention module of the semantic parser, aiming to improve compositional generalization. We find that the following factors improve compositional generalization: (a) using contextual representations, such as ELMO and BERT, (b) informing the decoder what input tokens have previously been attended to, (c) training the decoder attention to agree with pre-computed token alignments, and (d) downsampling examples corresponding to frequent program templates. While we substantially reduce the gap between in-distribution and OOD generalization, performance on OOD compositions is still substantially lower.
AB - Generalization of models to out-of-distribution (OOD) data has captured tremendous attention recently. Specifically, compositional generalization, i.e., whether a model generalizes to new structures built of components observed during training, has sparked substantial interest. In this work, we investigate compositional generalization in semantic parsing, a natural test-bed for compositional generalization, as output programs are constructed from sub-components. We analyze a wide variety of models and propose multiple extensions to the attention module of the semantic parser, aiming to improve compositional generalization. We find that the following factors improve compositional generalization: (a) using contextual representations, such as ELMO and BERT, (b) informing the decoder what input tokens have previously been attended to, (c) training the decoder attention to agree with pre-computed token alignments, and (d) downsampling examples corresponding to frequent program templates. While we substantially reduce the gap between in-distribution and OOD generalization, performance on OOD compositions is still substantially lower.
UR - http://www.scopus.com/inward/record.url?scp=85106113815&partnerID=8YFLogxK
U2 - https://doi.org/10.18653/v1/2020.findings-emnlp.225
DO - https://doi.org/10.18653/v1/2020.findings-emnlp.225
M3 - منشور من مؤتمر
T3 - Findings of the Association for Computational Linguistics Findings of ACL: EMNLP 2020
SP - 2482
EP - 2495
BT - Findings of the Association for Computational Linguistics Findings of ACL
PB - Association for Computational Linguistics (ACL)
T2 - Findings of the Association for Computational Linguistics, ACL 2020: EMNLP 2020
Y2 - 16 November 2020 through 20 November 2020
ER -