Cultural evolution creates the statistical structure of language

Inbal Arnon, Simon Kirby

Research output: Contribution to journalArticlepeer-review


Human language is unique in its structure: language is made up of parts that can be recombined in a productive way. The parts are not given but have to be discovered by learners exposed to unsegmented wholes. Across languages, the frequency distribution of those parts follows a power law. Both statistical properties—having parts and having them follow a particular distribution—facilitate learning, yet their origin is still poorly understood. Where do the parts come from and why do they follow a particular frequency distribution? Here, we show how these two core properties emerge from the process of cultural evolution with whole-to-part learning. We use an experimental analog of cultural transmission in which participants copy sets of non-linguistic sequences produced by a previous participant: This design allows us to ask if parts will emerge purely under pressure for the system to be learnable, even without meanings to convey. We show that parts emerge from initially unsegmented sequences, that their distribution becomes closer to a power law over generations, and, importantly, that these properties make the sets of sequences more learnable. We argue that these two core statistical properties of language emerge culturally both as a cause and effect of greater learnability.

Original languageAmerican English
Article number5255
JournalScientific Reports
Issue number1
StatePublished - Dec 2024

All Science Journal Classification (ASJC) codes

  • General


Dive into the research topics of 'Cultural evolution creates the statistical structure of language'. Together they form a unique fingerprint.

Cite this