TY - JOUR
T1 - Mini Worldlit
T2 - A Dataset of Contemporary Fiction from 13 Countries, Nine Languages, and Five Continents
AU - Piper, Andrew
AU - Orhero, Mathias Iroro
AU - Bamman, David
AU - Peksoy, Emrah
AU - Han, Christina
AU - Rastogi, Pallavi
AU - Bjerring-Hansen, Jens
AU - Rasmussen, Sebastian
AU - Long, Hoyt
AU - Smeets, Roel
AU - Marienberg-Milikowsky, Itay
AU - Stuart, Alexandra
AU - McEnaney, Tom
AU - Thomsen, Mads Rosendahl
N1 - Publisher Copyright: © 2025 The Author(s).
PY - 2025/1/1
Y1 - 2025/1/1
N2 - World literature plays a key role in understanding the global diversity of human storytelling. However, datasets suitable for large-scale cross-cultural analysis remain limited. Responding to the increasing digitization of literary texts and the need for more diverse and multilingual resources, we introduce Mini Worldlit, a manually curated dataset of 1,192 works of contemporary fiction from 13 countries, representing nine languages across five continents. Mini Worldlit employs consistent cross-cultural selection criteria, overseen by scholarly experts, to ensure geographic, linguistic, and stylistic coherence. The dataset provides a foundation for future comparative studies of global literary cultures, offering a template for cross-cultural sampling. Our methodology pairs geographic boundaries with linguistic communities, enabling a structured exploration of world literature. This dataset is designed to facilitate a comparative approach to understanding literature and support the growing field of multilingual digital humanities.
AB - World literature plays a key role in understanding the global diversity of human storytelling. However, datasets suitable for large-scale cross-cultural analysis remain limited. Responding to the increasing digitization of literary texts and the need for more diverse and multilingual resources, we introduce Mini Worldlit, a manually curated dataset of 1,192 works of contemporary fiction from 13 countries, representing nine languages across five continents. Mini Worldlit employs consistent cross-cultural selection criteria, overseen by scholarly experts, to ensure geographic, linguistic, and stylistic coherence. The dataset provides a foundation for future comparative studies of global literary cultures, offering a template for cross-cultural sampling. Our methodology pairs geographic boundaries with linguistic communities, enabling a structured exploration of world literature. This dataset is designed to facilitate a comparative approach to understanding literature and support the growing field of multilingual digital humanities.
KW - fiction
KW - literature
KW - multilingualism
KW - world literature
UR - http://www.scopus.com/inward/record.url?scp=85216978928&partnerID=8YFLogxK
U2 - https://doi.org/10.5334/johd.248
DO - https://doi.org/10.5334/johd.248
M3 - Article
SN - 2059-481X
VL - 11
JO - Journal of Open Humanities Data
JF - Journal of Open Humanities Data
ER -