TY - GEN
T1 - Data-driven morphological analysis and disambiguation for morphologically rich languages and universal dependencies
AU - More, Amir
AU - Tsarfaty, Reut
N1 - Publisher Copyright: © 1963-2018 ACL.
PY - 2016
Y1 - 2016
N2 - Parsing texts into universal dependencies (UD) in realistic scenarios requires infrastructure for morphological analysis and disambiguation (MA&D) of typologically different languages as a first tier. MA&D is particularly challenging in morphologically rich languages (MRLs), where the ambiguous space-delimited tokens ought to be disambiguated with respect to their constituent morphemes. Here we present a novel, language-agnostic, framework for MA&D, based on a transition system with two variants, word-based and morpheme-based, and a dedicated transition to mitigate the biases of variable-length morpheme sequences. Our experiments on a Modern Hebrew case study outperform the state of the art, and we show that the morpheme-based MD consistently outperforms our word-based variant. We further illustrate the utility and multilingual coverage of our framework by morphologically analyzing and disambiguating the large set of languages in the UD treebanks.
AB - Parsing texts into universal dependencies (UD) in realistic scenarios requires infrastructure for morphological analysis and disambiguation (MA&D) of typologically different languages as a first tier. MA&D is particularly challenging in morphologically rich languages (MRLs), where the ambiguous space-delimited tokens ought to be disambiguated with respect to their constituent morphemes. Here we present a novel, language-agnostic, framework for MA&D, based on a transition system with two variants, word-based and morpheme-based, and a dedicated transition to mitigate the biases of variable-length morpheme sequences. Our experiments on a Modern Hebrew case study outperform the state of the art, and we show that the morpheme-based MD consistently outperforms our word-based variant. We further illustrate the utility and multilingual coverage of our framework by morphologically analyzing and disambiguating the large set of languages in the UD treebanks.
UR - http://www.scopus.com/inward/record.url?scp=85054961237&partnerID=8YFLogxK
M3 - منشور من مؤتمر
SN - 9784879747020
T3 - COLING 2016 - 26th International Conference on Computational Linguistics, Proceedings of COLING 2016: Technical Papers
SP - 337
EP - 348
BT - COLING 2016 - 26th International Conference on Computational Linguistics, Proceedings of COLING 2016
PB - Association for Computational Linguistics, ACL Anthology
T2 - 26th International Conference on Computational Linguistics, COLING 2016
Y2 - 11 December 2016 through 16 December 2016
ER -