TY - GEN
T1 - Fast key-word searching using 'BoostMap' based embedding
AU - Saabni, Raid
AU - Bronstein, Alexander
PY - 2012
Y1 - 2012
N2 - Dynamic Time Warping (DTW), is a simple but efficient technique for matching sequences with rigid deformation. Therefore, it is frequently used for matching shapes in general, and shapes of handwritten words in Document Image Analysis tasks. As DTW is computationally expensive, efficient algorithms for fast computation are crucial. Retrieving images from large scale datasets using DTW, suffers from the constraint of linear searching of all sample in the datasets. Fast approximation algorithms for image retrieval are mostly based on normed spaces where the triangle inequality holds, which is unfortunately not the case with the DTW metric. In this paper we present a novel approach for fast search of handwritten words within large datasets of shapes. The presented approach is based on the Boost- Map [1] algorithm, for embedding the feature space with the DTW measurement to an euclidean space and use the Local Sensitivity Hashing algorithm (LSH) to rank the knearest neighbors of a query image. The algorithm, first, processes and embeds objects of the large data sets to a normed space. Fast approximation of κ-nearest neighbors using LSH on the embedding space, generates the top kranked samples which are examined using the real DTW distance to give final accurate results. We demonstrate our method on a database of 45, 800 images of word-parts extracted from the IFN/ENIT database [11] and images collected from 51 different writers. Our method achieves a speedup of 4 orders of magnitude over the exact method, at the cost of only a 2.2% reduction in accuracy.
AB - Dynamic Time Warping (DTW), is a simple but efficient technique for matching sequences with rigid deformation. Therefore, it is frequently used for matching shapes in general, and shapes of handwritten words in Document Image Analysis tasks. As DTW is computationally expensive, efficient algorithms for fast computation are crucial. Retrieving images from large scale datasets using DTW, suffers from the constraint of linear searching of all sample in the datasets. Fast approximation algorithms for image retrieval are mostly based on normed spaces where the triangle inequality holds, which is unfortunately not the case with the DTW metric. In this paper we present a novel approach for fast search of handwritten words within large datasets of shapes. The presented approach is based on the Boost- Map [1] algorithm, for embedding the feature space with the DTW measurement to an euclidean space and use the Local Sensitivity Hashing algorithm (LSH) to rank the knearest neighbors of a query image. The algorithm, first, processes and embeds objects of the large data sets to a normed space. Fast approximation of κ-nearest neighbors using LSH on the embedding space, generates the top kranked samples which are examined using the real DTW distance to give final accurate results. We demonstrate our method on a database of 45, 800 images of word-parts extracted from the IFN/ENIT database [11] and images collected from 51 different writers. Our method achieves a speedup of 4 orders of magnitude over the exact method, at the cost of only a 2.2% reduction in accuracy.
KW - Adaboost
KW - BoostMap
KW - Dynamic time warping
KW - Embedding
KW - Nearest neighbor
KW - Word searching
UR - http://www.scopus.com/inward/record.url?scp=84874253775&partnerID=8YFLogxK
U2 - https://doi.org/10.1109/ICFHR.2012.204
DO - https://doi.org/10.1109/ICFHR.2012.204
M3 - منشور من مؤتمر
SN - 9780769547749
T3 - Proceedings - International Workshop on Frontiers in Handwriting Recognition, IWFHR
SP - 734
EP - 739
BT - Proceedings - 13th International Conference on Frontiers in Handwriting Recognition, ICFHR 2012
T2 - 13th International Conference on Frontiers in Handwriting Recognition, ICFHR 2012
Y2 - 18 September 2012 through 20 September 2012
ER -