Spoken Term Detection Automatically Adjusted for a Given Threshold

Tzeviya Fuchs, Joseph Keshet

Research output: Contribution to journalArticlepeer-review

Abstract

Spoken term detection (STD) is the task of determining whether and where a given word or phrase appears in a given segment of speech. Algorithms for STD are often aimed at maximizing the gap between the scores of positive and negative examples. As such they are focused on ensuring that utterances where the term appears are ranked higher than utterances where the term does not appear. However, they do not determine a detection threshold between the two. In this paper, we propose a new approach for setting an absolute detection threshold for all terms by introducing a new calibrated loss function. The advantage of minimizing this loss function during training is that it aims at maximizing not only the relative ranking scores, but also adjusts the system to use a fixed threshold and thus maximizes the detection accuracy rates. We use the new loss function in the structured prediction setting and extend the discriminative keyword spotting algorithm for learning the spoken term detector with a single threshold for all terms. We further demonstrate the effectiveness of the new loss function by training a deep neural Siamese network in a weakly supervised setting for template-based STD, again with a single fixed threshold. Experiments with the TIMIT, Wall Street Journal (WSJ), and Switchboard corpora showed that our approach not only improved the accuracy rates when a fixed threshold was used but also obtained higher area under curve (AUC).

Original languageEnglish
Article number8070931
Pages (from-to)1310-1317
Number of pages8
JournalIEEE Journal on Selected Topics in Signal Processing
Volume11
Issue number8
DOIs
StatePublished - Dec 2017

Keywords

  • AUC maximization
  • Spoken term detection
  • deep-neural networks
  • keyword spotting
  • structured prediction

All Science Journal Classification (ASJC) codes

  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Spoken Term Detection Automatically Adjusted for a Given Threshold'. Together they form a unique fingerprint.

Cite this