Mini Worldlit: A Dataset of Contemporary Fiction from 13 Countries, Nine Languages, and Five Continents

Andrew Piper, Mathias Iroro Orhero, David Bamman, Emrah Peksoy, Christina Han, Pallavi Rastogi, Jens Bjerring-Hansen, Sebastian Rasmussen, Hoyt Long, Roel Smeets, Itay Marienberg-Milikowsky, Alexandra Stuart, Tom McEnaney, Mads Rosendahl Thomsen

Research output: Contribution to journalArticlepeer-review

Abstract

World literature plays a key role in understanding the global diversity of human storytelling. However, datasets suitable for large-scale cross-cultural analysis remain limited. Responding to the increasing digitization of literary texts and the need for more diverse and multilingual resources, we introduce Mini Worldlit, a manually curated dataset of 1,192 works of contemporary fiction from 13 countries, representing nine languages across five continents. Mini Worldlit employs consistent cross-cultural selection criteria, overseen by scholarly experts, to ensure geographic, linguistic, and stylistic coherence. The dataset provides a foundation for future comparative studies of global literary cultures, offering a template for cross-cultural sampling. Our methodology pairs geographic boundaries with linguistic communities, enabling a structured exploration of world literature. This dataset is designed to facilitate a comparative approach to understanding literature and support the growing field of multilingual digital humanities.

Original languageAmerican English
JournalJournal of Open Humanities Data
Volume11
DOIs
StatePublished - 1 Jan 2025

Keywords

  • fiction
  • literature
  • multilingualism
  • world literature

All Science Journal Classification (ASJC) codes

  • Information Systems
  • General Arts and Humanities
  • Library and Information Sciences

Cite this