TY - GEN
T1 - On the Limitations of Dataset Balancing
T2 - 2022 Findings of the Association for Computational Linguistics: NAACL 2022
AU - Schwartz, Roy
AU - Stanovsky, Gabriel
N1 - Publisher Copyright: © Findings of the Association for Computational Linguistics: NAACL 2022 - Findings.
PY - 2022
Y1 - 2022
N2 - Recent work has shown that deep learning models in NLP are highly sensitive to lowlevel correlations between simple features and specific output labels, leading to overfitting and lack of generalization. To mitigate this problem, a common practice is to balance datasets by adding new instances or by filtering out "easy" instances (Sakaguchi et al., 2020), culminating in a recent proposal to eliminate single-word correlations altogether (Gardner et al., 2021). In this opinion paper, we identify that despite these efforts, increasingly-powerful models keep exploiting ever-smaller spurious correlations, and as a result even balancing all single-word features is insufficient for mitigating all of these correlations. In parallel, a truly balanced dataset may be bound to "throw the baby out with the bathwater" and miss important signal encoding common sense and world knowledge. We highlight several alternatives to dataset balancing, focusing on enhancing datasets with richer contexts, allowing models to abstain and interact with users, and turning from large-scale fine-tuning to zero- or few-shot setups.
AB - Recent work has shown that deep learning models in NLP are highly sensitive to lowlevel correlations between simple features and specific output labels, leading to overfitting and lack of generalization. To mitigate this problem, a common practice is to balance datasets by adding new instances or by filtering out "easy" instances (Sakaguchi et al., 2020), culminating in a recent proposal to eliminate single-word correlations altogether (Gardner et al., 2021). In this opinion paper, we identify that despite these efforts, increasingly-powerful models keep exploiting ever-smaller spurious correlations, and as a result even balancing all single-word features is insufficient for mitigating all of these correlations. In parallel, a truly balanced dataset may be bound to "throw the baby out with the bathwater" and miss important signal encoding common sense and world knowledge. We highlight several alternatives to dataset balancing, focusing on enhancing datasets with richer contexts, allowing models to abstain and interact with users, and turning from large-scale fine-tuning to zero- or few-shot setups.
UR - http://www.scopus.com/inward/record.url?scp=85132409736&partnerID=8YFLogxK
U2 - https://doi.org/10.18653/v1/2022.findings-naacl.168
DO - https://doi.org/10.18653/v1/2022.findings-naacl.168
M3 - منشور من مؤتمر
T3 - Findings of the Association for Computational Linguistics: NAACL 2022 - Findings
SP - 2182
EP - 2194
BT - Findings of the Association for Computational Linguistics
PB - Association for Computational Linguistics (ACL)
Y2 - 10 July 2022 through 15 July 2022
ER -