A diverse dirichlet process ensemble for unsupervised induction of syntactic categories

Roi Reichart, Gal Elidan, Ari Rappoport

Research output: Contribution to conferencePaperpeer-review

Abstract

We address the problem of unsupervised tagging of phrase structure trees with phrase categories (parse tree nonterminals). Motivated by the inability of a range of direct clustering approaches to improve over the current leading algorithm, we propose a mixture of experts approach. In particular, we tackle the difficult challenge of producing a diverse collection of useful tagging experts, which can then be aggregated into a final high-quality tagging. To do so, we use the particular properties of the Dirichlet Process mixture model. We evaluate on English, German and Chinese corpora and demonstrate both a substantial and consistent improvement in overall performance over previous work, as well as empirical justification of our algorithmic choices.

Original languageEnglish
Pages2307-2324
Number of pages18
StatePublished - 2012
Event24th International Conference on Computational Linguistics, COLING 2012 - Mumbai, India
Duration: 8 Dec 201215 Dec 2012

Conference

Conference24th International Conference on Computational Linguistics, COLING 2012
Country/TerritoryIndia
CityMumbai
Period8/12/1215/12/12

Keywords

  • Dirichlet process
  • Ensemble learning
  • Grammar induction
  • Non terminals
  • Unsupervised parsing

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Computational Theory and Mathematics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'A diverse dirichlet process ensemble for unsupervised induction of syntactic categories'. Together they form a unique fingerprint.

Cite this