Experiment study on utilizing convolutional neural networks to recognize historical Arabic handwritten text

Reem Alaasam, Berat Kurar, Majeed Kassis, Jihad El-Sana

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Deep learning is a form of hierarchical learning, it consists of multiple layers of representations that gradually transform data into high level concepts. Deep learning has been providing the state of the art results for various computer vision problems. However, a typical deep leaning algorithm needs a large amount of data to train a deep model and guarantee the models ability to generalize. It is not easy to generate large labeled datasets and it is one of the main barriers to apply deep learning for many problems. Data augmentation schemes were introduced to overcome this limitation, by extending small available labeled datasets. In this work we experiment with extending a small labeled dataset of Arabic continuous subwords by an orders of magnitude. The labeled dataset, which consist of handwritten Arabic subwords is used to synthesize a large collection of labeled dataset. The synthesized subwords are based on one or multiple writing styles from the original labeled dataset. We also experiment with generating various printed forms of subwords. We include only Naskh font, as most of the Arabic historical manuscripts were written in this type of font. We train several convolutional neural networks using handwritten, printed and synthesized datasets and obtain encouraging results.

Original languageAmerican English
Title of host publication1st IEEE International Workshop on Arabic Script Analysis and Recognition, ASAR 2017
Pages124-128
Number of pages5
ISBN (Electronic)9781509066285
DOIs
StatePublished - 13 Oct 2017
Event1st IEEE International Workshop on Arabic Script Analysis and Recognition, ASAR 2017 - Nancy, France
Duration: 3 Apr 20175 Apr 2017

Publication series

Name1st IEEE International Workshop on Arabic Script Analysis and Recognition, ASAR 2017

Conference

Conference1st IEEE International Workshop on Arabic Script Analysis and Recognition, ASAR 2017
Country/TerritoryFrance
CityNancy
Period3/04/175/04/17

Keywords

  • Arabic
  • Database
  • Handwritten
  • Text recognition

All Science Journal Classification (ASJC) codes

  • Computer Vision and Pattern Recognition
  • Linguistics and Language
  • Computer Science Applications

Cite this