VML-HD: The historical Arabic documents dataset for recognition systems

Majeed Kassis, Alaa Abdalhaleem, Ahmad Droby, Reem Alaasam, Jihad El-Sana

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In this paper we present a new database with handwritten Arabic script. It is based on five books written by different writers from the years 1088-1451. We took 680 pages from these five books, and fully annotated them on the sub-word level. For each page we manually applied bounding boxes on the different sub-words and annotated the sequence of characters. It consists of 121,636 sub-word appearances consisted of 244,553 characters out of a vocabulary of 1,731 forms of sub-words. The database is described in detail and is designed for training and testing recognition systems for handwritten Arabic sub-words. This database is available for the purpose of research, and we encourage researchers to develop and test new methods using our database.

Original languageAmerican English
Title of host publication1st IEEE International Workshop on Arabic Script Analysis and Recognition, ASAR 2017
Pages11-14
Number of pages4
ISBN (Electronic)9781509066285
DOIs
StatePublished - 13 Oct 2017
Event1st IEEE International Workshop on Arabic Script Analysis and Recognition, ASAR 2017 - Nancy, France
Duration: 3 Apr 20175 Apr 2017

Publication series

Name1st IEEE International Workshop on Arabic Script Analysis and Recognition, ASAR 2017

Conference

Conference1st IEEE International Workshop on Arabic Script Analysis and Recognition, ASAR 2017
Country/TerritoryFrance
CityNancy
Period3/04/175/04/17

All Science Journal Classification (ASJC) codes

  • Computer Vision and Pattern Recognition
  • Linguistics and Language
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'VML-HD: The historical Arabic documents dataset for recognition systems'. Together they form a unique fingerprint.

Cite this