Historical corpora meet the digital humanities: the Jerusalem Corpus of Emergent Modern Hebrew

Research output: Contribution to journalArticlepeer-review

Abstract

The paper describes the creation of the first open access multi-genre historical corpus of Emergent Modern Hebrew, made possible by implementation of digital humanities methods in the process of corpus curation, encoding, and dissemination. Corpus contents originate in the Ben-Yehuda Project, an open access repository of Hebrew literature online, and in digital images curated from the collections of the National Library of Israel, a selection of which have been transcribed through a dedicated crowdsourcing task that feeds back into the library’s online catalog. Texts in the corpus are encoded following best practices in the digital humanities, including markup of metadata that enables time-sensitive research, linguistic and other, of the corpus. Evaluation of morphological analysis based on Modern Hebrew language models is shown to distinguish between genres in the historical variety, highlighting the importance of ephemeral materials for linguistic research and for potential collaboration with libraries and cultural institutions in the process of corpus creation. We demonstrate the use of the corpus in diachronic linguistic research and suggest ways in which the association it provides between digital images and texts can be used to support automatic language processing and to enhance resources in the digital humanities.

Original languageAmerican English
Pages (from-to)807-835
Number of pages29
JournalLanguage Resources and Evaluation
Volume53
Issue number4
DOIs
StatePublished - 1 Dec 2019

Keywords

  • Citizen science
  • Crowdsourcing
  • Digital humanities
  • Ephemera
  • Hebrew
  • Historical corpora
  • Language change

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Education
  • Linguistics and Language
  • Library and Information Sciences

Fingerprint

Dive into the research topics of 'Historical corpora meet the digital humanities: the Jerusalem Corpus of Emergent Modern Hebrew'. Together they form a unique fingerprint.

Cite this