Open-Vocabulary Keyword-Spotting with Adaptive Instance Normalization

Aviv Navon, Aviv Shamsian, Neta Glazer, Gill Hetz, Joseph Keshet

Research output: Contribution to journalConference articlepeer-review

Abstract

Open vocabulary keyword spotting is a crucial and challenging task in automatic speech recognition (ASR) that focuses on detecting user-defined keywords within a spoken utterance. Keyword spotting methods commonly map the audio utterance and keyword into a joint embedding space to obtain some affinity score. In this work, we propose AdaKWS, a novel method for keyword spotting in which a text encoder is trained to output keyword-conditioned normalization parameters. These parameters are used to process the auditory input. We provide an extensive evaluation using challenging and diverse multi-lingual benchmarks and show significant improvements over recent keyword spotting and ASR baselines. Furthermore, we study the effectiveness of our approach on low-resource languages that were unseen during the training. The results demonstrate a substantial performance improvement compared to baseline methods.

Original languageEnglish
Pages (from-to)11656-11660
Number of pages5
JournalProceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing
DOIs
StatePublished - 2024
Event2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Seoul, Korea, Republic of
Duration: 14 Apr 202419 Apr 2024

Keywords

  • adaptive instance normalization
  • open vocabulary
  • user-defined keyword spotting

All Science Journal Classification (ASJC) codes

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Open-Vocabulary Keyword-Spotting with Adaptive Instance Normalization'. Together they form a unique fingerprint.

Cite this