TY - GEN
T1 - Symmetric pattern based word embeddings for improved word similarity prediction
AU - Schwartz, Roy
AU - Reichart, Roi
AU - Rappoport, Ari
N1 - Publisher Copyright: © 2015 Association for Computational Linguistics.
PY - 2015
Y1 - 2015
N2 - We present a novel word level vector representation based on symmetric patterns (SPs). For this aim we automatically acquire SPs (e.g., “X and Y”) from a large corpus of plain text, and generate vectors where each coordinate represents the co-occurrence in SPs of the represented word with another word of the vocabulary. Our representation has three advantages over existing alternatives: First, being based on symmetric word relationships, it is highly suitable for word similarity prediction. Particularly, on the SimLex999 word similarity dataset, our model achieves a Spearman’s ρ score of 0.517, compared to 0.462 of the state-of-the-art word2vec model. Interestingly, our model performs exceptionally well on verbs, outperforming state-of-the-art baselines by 20.2–41.5%. Second, pattern features can be adapted to the needs of a target NLP application. For example, we show that we can easily control whether the embeddings derived from SPs deem antonym pairs (e.g. (big,small)) as similar or dissimilar, an important distinction for tasks such as word classification and sentiment analysis. Finally, we show that a simple combination of the word similarity scores generated by our method and by word2vec results in a superior predictive power over that of each individual model, scoring as high as 0.563 in Spearman’s ρ on SimLex999. This emphasizes the differences between the signals captured by each of the models.
AB - We present a novel word level vector representation based on symmetric patterns (SPs). For this aim we automatically acquire SPs (e.g., “X and Y”) from a large corpus of plain text, and generate vectors where each coordinate represents the co-occurrence in SPs of the represented word with another word of the vocabulary. Our representation has three advantages over existing alternatives: First, being based on symmetric word relationships, it is highly suitable for word similarity prediction. Particularly, on the SimLex999 word similarity dataset, our model achieves a Spearman’s ρ score of 0.517, compared to 0.462 of the state-of-the-art word2vec model. Interestingly, our model performs exceptionally well on verbs, outperforming state-of-the-art baselines by 20.2–41.5%. Second, pattern features can be adapted to the needs of a target NLP application. For example, we show that we can easily control whether the embeddings derived from SPs deem antonym pairs (e.g. (big,small)) as similar or dissimilar, an important distinction for tasks such as word classification and sentiment analysis. Finally, we show that a simple combination of the word similarity scores generated by our method and by word2vec results in a superior predictive power over that of each individual model, scoring as high as 0.563 in Spearman’s ρ on SimLex999. This emphasizes the differences between the signals captured by each of the models.
UR - http://www.scopus.com/inward/record.url?scp=85069960285&partnerID=8YFLogxK
U2 - 10.18653/v1/k15-1026
DO - 10.18653/v1/k15-1026
M3 - منشور من مؤتمر
T3 - CoNLL 2015 - 19th Conference on Computational Natural Language Learning, Proceedings
SP - 258
EP - 267
BT - CoNLL 2015 - 19th Conference on Computational Natural Language Learning, Proceedings
PB - Association for Computational Linguistics (ACL)
T2 - 19th Conference on Computational Natural Language Learning, CoNLL 2015
Y2 - 30 July 2015 through 31 July 2015
ER -