Abstract
Open vocabulary keyword spotting is a crucial and challenging task in automatic speech recognition (ASR) that focuses on detecting user-defined keywords within a spoken utterance. Keyword spotting methods commonly map the audio utterance and keyword into a joint embedding space to obtain some affinity score. In this work, we propose AdaKWS, a novel method for keyword spotting in which a text encoder is trained to output keyword-conditioned normalization parameters. These parameters are used to process the auditory input. We provide an extensive evaluation using challenging and diverse multi-lingual benchmarks and show significant improvements over recent keyword spotting and ASR baselines. Furthermore, we study the effectiveness of our approach on low-resource languages that were unseen during the training. The results demonstrate a substantial performance improvement compared to baseline methods.
Original language | English |
---|---|
Pages (from-to) | 11656-11660 |
Number of pages | 5 |
Journal | Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing |
DOIs | |
State | Published - 2024 |
Event | 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Seoul, Korea, Republic of Duration: 14 Apr 2024 → 19 Apr 2024 |
Keywords
- adaptive instance normalization
- open vocabulary
- user-defined keyword spotting
All Science Journal Classification (ASJC) codes
- Software
- Signal Processing
- Electrical and Electronic Engineering