Identification of Multiword Expressions by Combining Multiple Linguistic Information Sources

Yulia Tsvetkov, Shuly Wintner

Research output: Contribution to journalArticlepeer-review

Abstract

We propose a framework for using multiple sources of linguistic information in the task of identifying multiword expressions in natural language texts. We define various linguistically motivated classification features and introduce novel ways for computing them. We then manually define interrelationships among the features, and express them in a Bayesian network. The result is a powerful classifier that can identify multiword expressions of various types and multiple syntactic constructions in text corpora. Our methodology is unsupervised and language-independent; it requires relatively few language resources and is thus suitable for a large number of languages. We report results on English, French, and Hebrew, and demonstrate a significant improvement in identification accuracy, compared with less sophisticated baselines.

Original languageAmerican English
Pages (from-to)449-468
Number of pages20
JournalComputational Linguistics
Volume40
Issue number2
DOIs
StatePublished - Jun 2014

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Linguistics and Language
  • Computer Science Applications
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Identification of Multiword Expressions by Combining Multiple Linguistic Information Sources'. Together they form a unique fingerprint.

Cite this