Identification of multi-word expressions by combining multiple linguistic information sources

Yulia Tsvetkov, Shuly Wintner

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We propose an architecture for expressing various linguistically-motivated features that help identify multi-word expressions in natural language texts. The architecture combines various linguistically-motivated classification features in a Bayesian Network. We introduce novel ways for computing many of these features, and manually define linguistically-motivated interrelationships among them, which the Bayesian network models. Our methodology is almost entirely unsupervised and completely language in dependent; it relies on few language resources and is thus suitable for a large number of languages. Furthermore, unlike much recent work, our approach can identify expressions of various types and syntactic constructions. We demonstrate a significant improvement in identification accuracy, compared with less sophisticated baselines.

Original languageAmerican English
Title of host publicationEMNLP 2011 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
Pages836-845
Number of pages10
StatePublished - 2011
EventConference on Empirical Methods in Natural Language Processing, EMNLP 2011 - Edinburgh, United Kingdom
Duration: 27 Jul 201131 Jul 2011

Publication series

NameEMNLP 2011 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference

Conference

ConferenceConference on Empirical Methods in Natural Language Processing, EMNLP 2011
Country/TerritoryUnited Kingdom
CityEdinburgh
Period27/07/1131/07/11

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems

Fingerprint

Dive into the research topics of 'Identification of multi-word expressions by combining multiple linguistic information sources'. Together they form a unique fingerprint.

Cite this