TY - JOUR
T1 - Cross-lingual unified medical language system entity linking in online health communities
AU - Bitton, Yonatan
AU - Cohen, Raphael
AU - Schifter, Tamar
AU - Bachmat, Eitan
AU - Elhadad, Michael
AU - Elhadad, Noemie
N1 - Publisher Copyright: © The Author(s) 2020. Published by Oxford University Press on behalf of the American Medical Informatics Association.
PY - 2020/10/1
Y1 - 2020/10/1
N2 - Objective: In Hebrew online health communities, participants commonly write medical terms that appear as transliterated forms of a source term in English. Such transliterations introduce high variability in text and challenge text-analytics methods. To reduce their variability, medical terms must be normalized, such as linking them to Unified Medical Language System (UMLS) concepts. We present a method to identify both transliterated and translated Hebrew medical terms and link them with UMLS entities. Materials and Methods: We investigate the effect of linking terms in Camoni, a popular Israeli online health community in Hebrew. Our method, MDTEL (Medical Deep Transliteration Entity Linking), includes (1) an attention-based recurrent neural network encoder-decoder to transliterate words and mapping UMLS from English to Hebrew, (2) an unsupervised method for creating a transliteration dataset in any language without manually labeled data, and (3) an efficient way to identify and link medical entities in the Hebrew corpus to UMLS concepts, by producing a high-recall list of candidate medical terms in the corpus, and then filtering the candidates to relevant medical terms. Results: We carry out experiments on 3 disease-specific communities: diabetes, multiple sclerosis, and depression. MDTEL tagging and normalizing on Camoni posts achieved 99% accuracy, 92% recall, and 87% precision. When tagging and normalizing terms in queries from the Camoni search logs, UMLS-normalized queries improved search results in 46% of the cases. Conclusions: Cross-lingual UMLS entity linking from Hebrew is possible and improves search performance across communities. Annotated datasets, annotation guidelines, and code are made available online (https://github.com/yonatanbitton/mdtel).
AB - Objective: In Hebrew online health communities, participants commonly write medical terms that appear as transliterated forms of a source term in English. Such transliterations introduce high variability in text and challenge text-analytics methods. To reduce their variability, medical terms must be normalized, such as linking them to Unified Medical Language System (UMLS) concepts. We present a method to identify both transliterated and translated Hebrew medical terms and link them with UMLS entities. Materials and Methods: We investigate the effect of linking terms in Camoni, a popular Israeli online health community in Hebrew. Our method, MDTEL (Medical Deep Transliteration Entity Linking), includes (1) an attention-based recurrent neural network encoder-decoder to transliterate words and mapping UMLS from English to Hebrew, (2) an unsupervised method for creating a transliteration dataset in any language without manually labeled data, and (3) an efficient way to identify and link medical entities in the Hebrew corpus to UMLS concepts, by producing a high-recall list of candidate medical terms in the corpus, and then filtering the candidates to relevant medical terms. Results: We carry out experiments on 3 disease-specific communities: diabetes, multiple sclerosis, and depression. MDTEL tagging and normalizing on Camoni posts achieved 99% accuracy, 92% recall, and 87% precision. When tagging and normalizing terms in queries from the Camoni search logs, UMLS-normalized queries improved search results in 46% of the cases. Conclusions: Cross-lingual UMLS entity linking from Hebrew is possible and improves search performance across communities. Annotated datasets, annotation guidelines, and code are made available online (https://github.com/yonatanbitton/mdtel).
KW - Natural language processing
KW - Online health communities
KW - UMLS
UR - http://www.scopus.com/inward/record.url?scp=85093538885&partnerID=8YFLogxK
U2 - https://doi.org/10.1093/jamia/ocaa150
DO - https://doi.org/10.1093/jamia/ocaa150
M3 - Article
C2 - 32910823
SN - 1067-5027
VL - 27
SP - 1585
EP - 1592
JO - Journal of the American Medical Informatics Association
JF - Journal of the American Medical Informatics Association
IS - 10
ER -