Self-normalization properties of language modeling

Jacob Goldberger, Oren Melamud

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Self-normalizing discriminative models approximate the normalized probability of a class without having to compute the partition function. In the context of language modeling, this property is particularly appealing as it may significantly reduce run-times due to large word vocabularies. In this study, we provide a comprehensive investigation of language modeling self-normalization. First, we theoretically analyze the inherent self-normalization properties of Noise Contrastive Estimation (NCE) language models. Then, we compare them empirically to softmax-based approaches, which are self-normalized using explicit regularization, and suggest a hybrid model with compelling properties. Finally, we uncover a surprising negative correlation between self-normalization and perplexity across the board, as well as some regularity in the observed errors, which may potentially be used for improving self-normalization algorithms in the future.

Original languageEnglish
Title of host publicationCOLING 2018 - 27th International Conference on Computational Linguistics, Proceedings
EditorsEmily M. Bender, Leon Derczynski, Pierre Isabelle
PublisherAssociation for Computational Linguistics (ACL)
Pages764-773
Number of pages10
ISBN (Electronic)9781948087506
StatePublished - 2018
Event27th International Conference on Computational Linguistics, COLING 2018 - Santa Fe, United States
Duration: 20 Aug 201826 Aug 2018

Publication series

NameCOLING 2018 - 27th International Conference on Computational Linguistics, Proceedings

Conference

Conference27th International Conference on Computational Linguistics, COLING 2018
Country/TerritoryUnited States
CitySanta Fe
Period20/08/1826/08/18

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Computational Theory and Mathematics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Self-normalization properties of language modeling'. Together they form a unique fingerprint.

Cite this