Discriminative articulatory models for spoken term detection in low-resource conversational settings

Rohit Prabhavalkar, Karen Livescu, Eric Fosler-Lussier, Joseph Keshet

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We study spoken term detection (STD) - the task of determining whether and where a given word or phrase appears in a given segment of speech - using articulatory feature-based pronunciation models. The models are motivated by the requirements of STD in low-resource settings, in which it may not be feasible to train a large-vocabulary continuous speech recognition system, as well as by the need to address pronunciation variation in conversational speech. Our STD system is trained to maximize the expected area under the receiver operating characteristic curve, often used to evaluate STD performance. In experimental evaluations on the Switchboard corpus, we find that our approach outperforms a baseline HMM-based system across a number of training set sizes, as well as a discriminative phone-based model in some settings.

Original languageEnglish
Title of host publication2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Proceedings
Pages8287-8291
Number of pages5
DOIs
StatePublished - 18 Oct 2013
Event2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Vancouver, BC, Canada
Duration: 26 May 201331 May 2013

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

Conference

Conference2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013
Country/TerritoryCanada
CityVancouver, BC
Period26/05/1331/05/13

Keywords

  • AUC
  • articulatory features
  • discriminative training
  • spoken term detection
  • structural SVM

All Science Journal Classification (ASJC) codes

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Discriminative articulatory models for spoken term detection in low-resource conversational settings'. Together they form a unique fingerprint.

Cite this