Testing the limits of natural language models for predicting human language judgements

Tal Golan, Matthew Siegelman, Nikolaus Kriegeskorte, Christopher Baldassano

Research output: Contribution to journalArticlepeer-review


Neural network language models appear to be increasingly aligned with how humans process and generate language, but identifying their weaknesses through adversarial examples is challenging due to the discrete nature of language and the complexity of human language perception. We bypass these limitations by turning the models against each other. We generate controversial sentence pairs where two language models disagree about which sentence is more likely to occur. Considering nine language models (including n-gram, recurrent neural networks and transformers), we created hundreds of controversial sentence pairs through synthetic optimization or by selecting sentences from a corpus. Controversial sentence pairs proved highly effective at revealing model failures and identifying models that aligned most closely with human judgements of which sentence is more likely. The most human-consistent model tested was GPT-2, although experiments also revealed substantial shortcomings in its alignment with human perception.

Original languageAmerican English
Pages (from-to)952-964
Number of pages13
JournalNature Machine Intelligence
Issue number9
StatePublished - 1 Sep 2023

All Science Journal Classification (ASJC) codes

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications
  • Artificial Intelligence

Cite this