Abstract
When porting parsers to a new domain, many of the errors are related to wrong attachment of out-of-vocabulary words. Since there is no available annotated data to learn the attachment preferences of the target domain words, we attack this problem using a model of selectional preferences based on domainspecific word classes. Our method uses Latent Dirichlet Allocations (LDA) to learn a
domain-specific Selectional Preference model in the target domain using un-annotated data. The model provides features that model the affinities among pairs of words in the domain. To incorporate these new features in the
parsing model, we adopt the co-training approach and retrain the parser with the selectional preferences features. We apply this method for adapting Easy First, a fast nondirectional parser trained on WSJ, to the biomedical domain (Genia Treebank). The Selectional Preference features reduce error by
4.5% over the co-training baseline
domain-specific Selectional Preference model in the target domain using un-annotated data. The model provides features that model the affinities among pairs of words in the domain. To incorporate these new features in the
parsing model, we adopt the co-training approach and retrain the parser with the selectional preferences features. We apply this method for adapting Easy First, a fast nondirectional parser trained on WSJ, to the biomedical domain (Genia Treebank). The Selectional Preference features reduce error by
4.5% over the co-training baseline
Original language | English |
---|---|
Title of host publication | Proceedings of ACL 2012 Student Research Workshop |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 43-48 |
Number of pages | 6 |
State | Published - 2012 |