In-Context Learning Creates Task Vectors

Roee Hendel, Mor Geva, Amir Globerson

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In-context learning (ICL) in Large Language Models (LLMs) has emerged as a powerful new learning paradigm. However, its underlying mechanism is still not well understood. In particular, it is challenging to map it to the “standard” machine learning framework, where one uses a training set S to find a best-fitting function f(x) in some hypothesis class. Here we make progress on this problem by showing that the functions learned by ICL often have a very simple structure: they correspond to the transformer LLM whose only inputs are the query x and a single “task vector” calculated from the training set. Thus, ICL can be seen as compressing S into a single task vector θ(S) and then using this task vector to modulate the transformer to produce the output. We support the above claim via comprehensive experiments across a range of models and tasks.

Original languageEnglish
Title of host publicationFindings of the Association for Computational Linguistics
Subtitle of host publicationEMNLP 2023
PublisherAssociation for Computational Linguistics (ACL)
Pages9318-9333
Number of pages16
ISBN (Electronic)9798891760615
StatePublished - 2023
Event2023 Findings of the Association for Computational Linguistics: EMNLP 2023 - Singapore, Singapore
Duration: 6 Dec 202310 Dec 2023

Publication series

NameFindings of the Association for Computational Linguistics: EMNLP 2023

Conference

Conference2023 Findings of the Association for Computational Linguistics: EMNLP 2023
Country/TerritorySingapore
CitySingapore
Period6/12/2310/12/23

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems
  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'In-Context Learning Creates Task Vectors'. Together they form a unique fingerprint.

Cite this