McPhraSy: Multi-Context Phrase Similarity and Clustering

Amir D.N. Cohen, Hila Gonen, Ori Shapira, Ran Levy, Yoav Goldberg

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Phrase similarity is a key component of many NLP applications. Current phrase similarity methods focus on embedding the phrase itself and use the phrase context only during training of the pretrained model. To better leverage the information in the context, we propose McPhraSy (Multi-context Phrase Similarity), a novel algorithm for estimating the similarity of phrases based on multiple contexts. At inference time, McPhraSy represents each phrase by considering multiple contexts in which it appears and computes the similarity of two phrases by aggregating the pairwise similarities between the contexts of the phrases. Incorporating context during inference enables McPhraSy to outperform current state-of-the-art models on two phrase similarity datasets by up to 13.3%. Finally, we also present a new downstream task that relies on phrase similarity - keyphrase clustering - and create a new benchmark for it in the product reviews domain. We show that McPhraSy surpasses all other baselines for this task.

Original languageAmerican English
Title of host publicationFindings of the Association for Computational Linguistics
Subtitle of host publicationEMNLP 2022
EditorsYoav Goldberg, Zornitsa Kozareva, Yue Zhang
PublisherAssociation for Computational Linguistics (ACL)
Pages3538-3550
Number of pages13
ISBN (Electronic)9781959429432
DOIs
StatePublished - 1 Jan 2022
Externally publishedYes
Event2022 Findings of the Association for Computational Linguistics: EMNLP 2022 - Hybrid, Abu Dhabi, United Arab Emirates
Duration: 7 Dec 202211 Dec 2022

Publication series

NameFindings of the Association for Computational Linguistics: EMNLP 2022

Conference

Conference2022 Findings of the Association for Computational Linguistics: EMNLP 2022
Country/TerritoryUnited Arab Emirates
CityHybrid, Abu Dhabi
Period7/12/2211/12/22

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems

Fingerprint

Dive into the research topics of 'McPhraSy: Multi-Context Phrase Similarity and Clustering'. Together they form a unique fingerprint.

Cite this