Simplifying the reading of historical manuscripts

Abedelkadir Asi, Rafi Cohen, Klara Kedem, Jihad El-Sana

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Complex document layouts pose prominent challenges for document image understanding algorithms. These layouts impose irregularities on the location of text paragraphs which consequently induces difficulties in reading the text. In this paper we present a robust framework for analyzing historical manuscripts with complex layouts. This framework aims to provide a convenient reading experience for historians through topnotch algorithms for text localization, classification and dewarping. We segment text into spatially coherent regions and text-lines using texture-based filters and refine this segmentation by exploiting Markov Random Fields (MRFs). A principled technique is presented for dewarping curvy text regions using a non-linear geometric transformation. The framework has been validated using a subset of a publicly available dataset of historical documents and it provided promising results.

Original languageAmerican English
Title of host publication13th IAPR International Conference on Document Analysis and Recognition, ICDAR 2015 - Conference Proceedings
Pages826-830
Number of pages5
ISBN (Electronic)9781479918058
DOIs
StatePublished - 20 Nov 2015
Event13th International Conference on Document Analysis and Recognition, ICDAR 2015 - Nancy, France
Duration: 23 Aug 201526 Aug 2015

Publication series

NameProceedings of the International Conference on Document Analysis and Recognition, ICDAR
Volume2015-November

Conference

Conference13th International Conference on Document Analysis and Recognition, ICDAR 2015
Country/TerritoryFrance
CityNancy
Period23/08/1526/08/15

All Science Journal Classification (ASJC) codes

  • Computer Vision and Pattern Recognition

Fingerprint

Dive into the research topics of 'Simplifying the reading of historical manuscripts'. Together they form a unique fingerprint.

Cite this