Example-based cross-modal denoising

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Widespread current cameras are part of multisensory systems with an integrated computer (smartphones). Computer vision thus starts evolving to cross-modal sensing, where vision and other sensors cooperate. This exists in humans and animals, reflecting nature, where visual events are often accompanied with sounds. Can vision assist in denoising another modality? As a case study, we demonstrate this principle by using video to denoise audio. Unimodal (audio-only) denoising is very difficult when the noise source is non-stationary, complex (e.g., another speaker or music in the background), strong and not individually accessible in any modality (unseen). Cross-modal association can help: a clear video can direct the audio estimator. We show this using an example-based approach. A training movie having clear audio provides cross-modal examples. In testing, cross-modal input segments having noisy audio rely on the examples for denoising. The video channel drives the search for relevant training examples. We demonstrate this in speech and music experiments.

Original languageEnglish
Title of host publication2012 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2012
Pages486-493
Number of pages8
DOIs
StatePublished - 2012
Event2012 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2012 - Providence, RI, United States
Duration: 16 Jun 201221 Jun 2012

Publication series

NameProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Conference

Conference2012 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2012
Country/TerritoryUnited States
CityProvidence, RI
Period16/06/1221/06/12

All Science Journal Classification (ASJC) codes

  • Software
  • Computer Vision and Pattern Recognition

Fingerprint

Dive into the research topics of 'Example-based cross-modal denoising'. Together they form a unique fingerprint.

Cite this