Abstract
We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset of them) are being modified. We show that with small-to-medium training data, applying BitFit on pre-trained BERT models is competitive with (and sometimes better than) fine-tuning the entire model. For larger data, the method is competitive with other sparse fine-tuning methods. Besides their practical utility, these findings are relevant for the question of understanding the commonly-used process of finetuning: they support the hypothesis that finetuning is mainly about exposing knowledge induced by language-modeling training, rather than learning new task-specific linguistic knowledge.
Original language | English |
---|---|
Title of host publication | ACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Short Papers) |
Editors | Smaranda Muresan, Preslav Nakov, Aline Villavicencio |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 1-9 |
Number of pages | 9 |
ISBN (Electronic) | 9781955917223 |
State | Published - 2022 |
Event | 60th Annual Meeting of the Association for Computational Linguistics, ACL 2022 - Dublin, Ireland Duration: 22 May 2022 → 27 May 2022 https://aclanthology.org/2022.acl-long.0/ |
Publication series
Name | Proceedings of the Annual Meeting of the Association for Computational Linguistics |
---|---|
Volume | 2 |
Conference
Conference | 60th Annual Meeting of the Association for Computational Linguistics, ACL 2022 |
---|---|
Country/Territory | Ireland |
City | Dublin |
Period | 22/05/22 → 27/05/22 |
Internet address |
All Science Journal Classification (ASJC) codes
- Computer Science Applications
- Linguistics and Language
- Language and Linguistics