Abstract
We study spoken term detection - the task of determining whether and where a given word or phrase appears in a given segment of speech - in the setting of limited training data. This setting is becoming increasingly important as interest grows in porting spoken term detection to multiple lowresource languages and acoustic environments. We propose a discriminative algorithm that aims at maximizing the area under the receiver operating characteristic curve, often used to evaluate the performance of spoken term detection systems. We implement the approach using a set of feature functions based on multilayer perceptron classifiers of phones and articulatory features, and experiment on data drawn from the Switchboard database of conversational telephone speech. Our approach outperforms a baseline HMM-based system by a large margin across a number of training set sizes.
Original language | English |
---|---|
Title of host publication | 2012 Symposium on Machine Learning in Speech and Language Processing, MLSLP 2012 |
Pages | 22-25 |
Number of pages | 4 |
State | Published - 2012 |
Externally published | Yes |
Event | 2012 Symposium on Machine Learning in Speech and Language Processing, MLSLP 2012 - Portland, United States Duration: 14 Sep 2012 → … |
Conference
Conference | 2012 Symposium on Machine Learning in Speech and Language Processing, MLSLP 2012 |
---|---|
Country/Territory | United States |
City | Portland |
Period | 14/09/12 → … |
Keywords
- AUC
- Spoken term detection
- discriminative training
- structural SVM
All Science Journal Classification (ASJC) codes
- Artificial Intelligence
- Human-Computer Interaction
- Computer Vision and Pattern Recognition
- Signal Processing
- Linguistics and Language