Hard and Soft Labeling for Hebrew Paleography: A Case Study

Ahmad Droby, Daria Vasyutinsky Shapira, Irina Rabaev, Berat Kurar Barakat, Jihad El-Sana

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Paleography studies the writing styles of manuscripts and recognizes different styles and modes of scripts. We explore the applicability of hard and soft-labeling for training deep-learning models to classify Hebrew scripts. In contrast to the hard-labeling scheme, where each document image has one label representing its class, the soft-labeling approach labels an image by a label vector. Each element of the vector is the similarity of the document image to a certain regional writing style or graphical mode. In addition, we introduce a dataset of medieval Hebrew manuscripts that provides complete coverage of major Hebrew writing styles and modes. A Hebrew paleography expert manually annotated the ground truth for soft-labeling. We compare the applicability of soft and hard-labeling approaches on the presented dataset, analyze, and discuss the findings.

Original languageAmerican English
Title of host publicationDocument Analysis Systems - 15th IAPR International Workshop, DAS 2022, Proceedings
EditorsSeiichi Uchida, Elisa Barney, Véronique Eglin
PublisherSpringer Science and Business Media Deutschland GmbH
Pages492-506
Number of pages15
ISBN (Print)9783031065545
DOIs
StatePublished - 1 Jan 2022
Event15th IAPR International Workshop on Document Analysis Systems, DAS 2022 - La Rochelle, France
Duration: 22 May 202225 May 2022

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13237 LNCS

Conference

Conference15th IAPR International Workshop on Document Analysis Systems, DAS 2022
Country/TerritoryFrance
CityLa Rochelle
Period22/05/2225/05/22

Keywords

  • Convolutional neural network
  • Digital paleography
  • Medieval Hebrew manuscripts
  • Script type classification
  • Soft-labeling

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • General Computer Science

Cite this