TY - GEN
T1 - Adaptive parser-centric text normalization
AU - Zhang, Congle
AU - Baldwin, Tyler
AU - Ho, Howard
AU - Kimelfeld, Benny
AU - Li, Yunyao
PY - 2013
Y1 - 2013
N2 - Text normalization is an important first step towards enabling many Natural Language Processing (NLP) tasks over informal text. While many of these tasks, such as parsing, perform the best over fully grammatically correct text, most existing text normalization approaches narrowly define the task in the word-to-word sense; that is, the task is seen as that of mapping all out-of-vocabulary non-standard words to their in-vocabulary standard forms. In this paper, we take a parser-centric view of normalization that aims to convert raw informal text into grammatically correct text. To understand the real effect of normalization on the parser, we tie normalization performance directly to parser performance. Additionally, we design a customizable framework to address the often overlooked concept of domain adaptability, and illustrate that the system allows for transfer to new domains with a minimal amount of data and effort. Our experimental study over datasets from three domains demonstrates that our approach outperforms not only the state-of-the-art wordto-word normalization techniques, but also manual word-to-word annotations.
AB - Text normalization is an important first step towards enabling many Natural Language Processing (NLP) tasks over informal text. While many of these tasks, such as parsing, perform the best over fully grammatically correct text, most existing text normalization approaches narrowly define the task in the word-to-word sense; that is, the task is seen as that of mapping all out-of-vocabulary non-standard words to their in-vocabulary standard forms. In this paper, we take a parser-centric view of normalization that aims to convert raw informal text into grammatically correct text. To understand the real effect of normalization on the parser, we tie normalization performance directly to parser performance. Additionally, we design a customizable framework to address the often overlooked concept of domain adaptability, and illustrate that the system allows for transfer to new domains with a minimal amount of data and effort. Our experimental study over datasets from three domains demonstrates that our approach outperforms not only the state-of-the-art wordto-word normalization techniques, but also manual word-to-word annotations.
UR - http://www.scopus.com/inward/record.url?scp=84904337333&partnerID=8YFLogxK
M3 - منشور من مؤتمر
SN - 9781937284503
T3 - ACL 2013 - 51st Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference
SP - 1159
EP - 1168
BT - Long Papers
T2 - 51st Annual Meeting of the Association for Computational Linguistics, ACL 2013
Y2 - 4 August 2013 through 9 August 2013
ER -