Abstract
When porting parsers to a new domain, many of the errors are related to wrong attachment of out-of-vocabulary words. Since there is no available annotated data to learn the attachment preferences of the target domain words, we attack this problem using a model of selectional preferences based on domainspecific word classes. Our method uses Latent Dirichlet Allocations (LDA) to learn a
domain-specific Selectional Preference model in the target domain using un-annotated data. The model provides features that model the affinities among pairs of words in the domain. To incorporate these new features in the
parsing model, we adopt the co-training approach and retrain the parser with the selectional preferences features. We apply this method for adapting Easy First, a fast nondirectional parser trained on WSJ, to the biomedical domain (Genia Treebank). The Selectional Preference features reduce error by
4.5% over the co-training baseline
domain-specific Selectional Preference model in the target domain using un-annotated data. The model provides features that model the affinities among pairs of words in the domain. To incorporate these new features in the
parsing model, we adopt the co-training approach and retrain the parser with the selectional preferences features. We apply this method for adapting Easy First, a fast nondirectional parser trained on WSJ, to the biomedical domain (Genia Treebank). The Selectional Preference features reduce error by
4.5% over the co-training baseline
| Original language | American English |
|---|---|
| Title of host publication | Proceedings of ACL 2012 Student Research Workshop |
| Editors | Jackie C. K. Cheung, Jun Hatori, Carlos Henriquez, Ann Irvine |
| Place of Publication | Jeju Island, Korea |
| Publisher | Association for Computational Linguistics (ACL) |
| Pages | 43-48 |
| Number of pages | 6 |
| Edition | 1st |
| State | Published - Jul 2012 |