Efficient Long-Text Understanding with Short-Text Models

Maor Ivgi, Uri Shaham, Jonathan Berant

Research output: Contribution to journalArticlepeer-review

Abstract

Transformer-based pretrained language models (LMs) are ubiquitous across natural language understanding, but cannot be applied to long sequences such as stories, scien-tific articles, and long documents due to their quadratic complexity. While a myriad of efficient transformer variants have been proposed, they are typically based on cus-tom implementations that require expensive pretraining from scratch. In this work, we pro-pose SLED: SLiding-Encoder and Decoder, a simple approach for processing long sequences that re-uses and leverages battle-tested short-text pretrained LMs. Specifically, we partition the input into overlapping chunks, encode each with a short-text LM encoder and use the pretrained decoder to fuse information across chunks (fusion-in-decoder). We illustrate through controlled experiments that SLED offers a viable strategy for long text understanding and evaluate our approach on SCROLLS, a benchmark with seven datasets across a wide range of language understanding tasks. We find that SLED is competitive with specialized models that are up to 50x larger and require a dedicated and expensive pretraining step.

Original languageAmerican English
Pages (from-to)284-299
Number of pages16
JournalTransactions of the Association for Computational Linguistics
Volume11
DOIs
StatePublished - 1 Jan 2023

All Science Journal Classification (ASJC) codes

  • Communication
  • Human-Computer Interaction
  • Linguistics and Language
  • Computer Science Applications
  • Artificial Intelligence

Cite this