TY - GEN
T1 - Dyna-bAbI
T2 - 11th Joint Conference on Lexical and Computational Semantics, *SEM 2022
AU - Tamari, Ronen
AU - Richardson, Kyle
AU - Kahlon, Noam
AU - Sar-Shalom, Aviad
AU - Liu, Nelson F.
AU - Tsarfaty, Reut
AU - Shahaf, Dafna
N1 - Publisher Copyright: © 2022 Association for Computational Linguistics.
PY - 2022
Y1 - 2022
N2 - While neural language models often perform surprisingly well on natural language understanding (NLU) tasks, their strengths and limitations remain poorly understood. Controlled synthetic tasks are thus an increasingly important resource for diagnosing model behavior. In this work we focus on story understanding, a core competency for NLU systems. However, the main synthetic resource for story understanding, the bAbI benchmark, lacks such a systematic mechanism for controllable task generation. We develop Dyna-bAbI, a dynamic framework providing fine-grained control over task generation in bAbI. We demonstrate our ideas by constructing three new tasks requiring compositional generalization, an important evaluation setting absent from the original benchmark. We tested both special-purpose models developed for bAbI as well as state-of-the-art pre-trained methods, and found that while both approaches solve the original tasks (>99% accuracy), neither approach succeeded in the compositional generalization setting, indicating the limitations of the original training data. We explored ways to augment the original data, and found that though diversifying training data was far more useful than simply increasing dataset size, it was still insufficient for driving robust compositional generalization (with <70% accuracy for complex compositions). Our results underscore the importance of highly controllable task generators for creating robust NLU systems through a virtuous cycle of model and data development.
AB - While neural language models often perform surprisingly well on natural language understanding (NLU) tasks, their strengths and limitations remain poorly understood. Controlled synthetic tasks are thus an increasingly important resource for diagnosing model behavior. In this work we focus on story understanding, a core competency for NLU systems. However, the main synthetic resource for story understanding, the bAbI benchmark, lacks such a systematic mechanism for controllable task generation. We develop Dyna-bAbI, a dynamic framework providing fine-grained control over task generation in bAbI. We demonstrate our ideas by constructing three new tasks requiring compositional generalization, an important evaluation setting absent from the original benchmark. We tested both special-purpose models developed for bAbI as well as state-of-the-art pre-trained methods, and found that while both approaches solve the original tasks (>99% accuracy), neither approach succeeded in the compositional generalization setting, indicating the limitations of the original training data. We explored ways to augment the original data, and found that though diversifying training data was far more useful than simply increasing dataset size, it was still insufficient for driving robust compositional generalization (with <70% accuracy for complex compositions). Our results underscore the importance of highly controllable task generators for creating robust NLU systems through a virtuous cycle of model and data development.
UR - http://www.scopus.com/inward/record.url?scp=85139176182&partnerID=8YFLogxK
U2 - 10.18653/v1/2022.starsem-1.9
DO - 10.18653/v1/2022.starsem-1.9
M3 - منشور من مؤتمر
T3 - *SEM 2022 - 11th Joint Conference on Lexical and Computational Semantics, Proceedings of the Conference
SP - 101
EP - 122
BT - *SEM 2022 - 11th Joint Conference on Lexical and Computational Semantics, Proceedings of the Conference
A2 - Nastase, Vivi
A2 - Pavlick, Ellie
A2 - Pilehvar, Mohammad Taher
A2 - Camacho-Collados, Jose
A2 - Raganato, Alessandro
PB - Association for Computational Linguistics (ACL)
Y2 - 14 July 2022 through 15 July 2022
ER -