Multi-microphone speaker separation based on deep DOA estimation

Shlomo E. Chazan, Hodaya Hammer, Gershon Hazan, Jacob Goldberger, Sharon Gannot

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In this paper, we present a multi-microphone speech separation algorithm based on masking inferred from the speakers direction of arrival (DOA). According to the W-disjoint orthogonality property of speech signals, each time-frequency (TF) bin is dominated by a single speaker. This TF bin can therefore be associated with a single DOA. In our procedure, we apply a deep neural network (DNN) with a U-net architecture to infer the DOA of each TF bin from a concatenated set of the spectra of the microphone signals. Separation is obtained by multiplying the reference microphone by the masks associated with the different DOAs. Our proposed deep direction estimation for speech separation (DDESS) method is inspired by the recent advances in deep clustering methods. Unlike already established methods that apply the clustering in a latent embedded space, in our approach the embedding is closely associated with the spatial information, as manifested by the different speakers' directions of arrival.

Original languageEnglish
Title of host publicationEUSIPCO 2019 - 27th European Signal Processing Conference
ISBN (Electronic)9789082797039
DOIs
StatePublished - Sep 2019
Event27th European Signal Processing Conference, EUSIPCO 2019 - A Coruna, Spain
Duration: 2 Sep 20196 Sep 2019

Publication series

NameEuropean Signal Processing Conference
Volume2019-September

Conference

Conference27th European Signal Processing Conference, EUSIPCO 2019
Country/TerritorySpain
CityA Coruna
Period2/09/196/09/19

All Science Journal Classification (ASJC) codes

  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Multi-microphone speaker separation based on deep DOA estimation'. Together they form a unique fingerprint.

Cite this