Multimodal Kernel Method for Activity Detection of Sound Sources

Research output: Contribution to journalArticlepeer-review

Abstract

We consider the problem of acoustic scene analysis of multiple sound sources. In our setting, the sound sources are measured by a single microphone, and a particular source of interest is also captured by a video camera during a short time interval. The goal in this paper is to detect the activity of the source of interest even when the video data are missing, while ignoring the other sound sources. To address this problem, we propose a kernel-based algorithm that incorporates the audio-visual data by a combination of affinity kernels, constructed separately from the audio and the video data. We introduce a distance measure between data points that is associated with the source of interest, while reducing the effect of the other (interfering) sources. Using this distance, we devise a measure for the presence of the source of interest, which is naturally extended to time intervals, in which only the audio signal is available. Experimental results demonstrate the improved performance of the proposed algorithm compared to competing approaches implying the significance of the video signal in the analysis of complex acoustic scenes.

Original languageEnglish
Pages (from-to)1322-1334
Number of pages13
JournalIEEE/ACM Transactions on Audio Speech and Language Processing
Volume25
Issue number6
DOIs
StatePublished - Jun 2017

Keywords

  • Acoustic scene
  • audio-visual
  • data fusion
  • kernel
  • multi-modal
  • transient noise

All Science Journal Classification (ASJC) codes

  • Computer Science (miscellaneous)
  • Acoustics and Ultrasonics
  • Computational Mathematics
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Multimodal Kernel Method for Activity Detection of Sound Sources'. Together they form a unique fingerprint.

Cite this