Value-aware Approximate Attention

Ankit Gupta, Jonathan Berant

نتاج البحث: فصل من :كتاب / تقرير / مؤتمرمنشور من مؤتمرمراجعة النظراء

ملخص

Following the success of dot-product attention in Transformers, numerous approximations have been recently proposed to address its quadratic complexity with respect to the input length. However, all approximations thus far have ignored the contribution of the value vectors to the quality of approximation. In this work, we argue that research efforts should be directed towards approximating the true output of the attention sub-layer, which includes the value vectors. We propose a value-aware objective, and show theoretically and empirically that an optimal approximation of a value-aware objective substantially outperforms an optimal approximation that ignores values, in the context of language modeling. Moreover, we show that the choice of kernel function for computing attention similarity can substantially affect the quality of sparse approximations, where kernel functions that are less skewed are more affected by the value vectors.

اللغة الأصليةالإنجليزيّة
عنوان منشور المضيفEMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings
ناشرAssociation for Computational Linguistics (ACL)
الصفحات9567-9574
عدد الصفحات8
رقم المعيار الدولي للكتب (الإلكتروني)9781955917094
حالة النشرنُشِر - 2021
الحدث2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021 - Virtual, Punta Cana, !!Dominican Republic
المدة: ٧ نوفمبر ٢٠٢١١١ نوفمبر ٢٠٢١

سلسلة المنشورات

الاسمEMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings

!!Conference

!!Conference2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021
الدولة/الإقليم!!Dominican Republic
المدينةVirtual, Punta Cana
المدة٧/١١/٢١١١/١١/٢١

All Science Journal Classification (ASJC) codes

  • !!Computational Theory and Mathematics
  • !!Computer Science Applications
  • !!Information Systems

بصمة

أدرس بدقة موضوعات البحث “Value-aware Approximate Attention'. فهما يشكلان معًا بصمة فريدة.

قم بذكر هذا