TY - GEN
T1 - Multi-microphone speaker separation based on deep DOA estimation
AU - Chazan, Shlomo E.
AU - Hammer, Hodaya
AU - Hazan, Gershon
AU - Goldberger, Jacob
AU - Gannot, Sharon
N1 - Publisher Copyright: © 2019,IEEE
PY - 2019/9
Y1 - 2019/9
N2 - In this paper, we present a multi-microphone speech separation algorithm based on masking inferred from the speakers direction of arrival (DOA). According to the W-disjoint orthogonality property of speech signals, each time-frequency (TF) bin is dominated by a single speaker. This TF bin can therefore be associated with a single DOA. In our procedure, we apply a deep neural network (DNN) with a U-net architecture to infer the DOA of each TF bin from a concatenated set of the spectra of the microphone signals. Separation is obtained by multiplying the reference microphone by the masks associated with the different DOAs. Our proposed deep direction estimation for speech separation (DDESS) method is inspired by the recent advances in deep clustering methods. Unlike already established methods that apply the clustering in a latent embedded space, in our approach the embedding is closely associated with the spatial information, as manifested by the different speakers' directions of arrival.
AB - In this paper, we present a multi-microphone speech separation algorithm based on masking inferred from the speakers direction of arrival (DOA). According to the W-disjoint orthogonality property of speech signals, each time-frequency (TF) bin is dominated by a single speaker. This TF bin can therefore be associated with a single DOA. In our procedure, we apply a deep neural network (DNN) with a U-net architecture to infer the DOA of each TF bin from a concatenated set of the spectra of the microphone signals. Separation is obtained by multiplying the reference microphone by the masks associated with the different DOAs. Our proposed deep direction estimation for speech separation (DDESS) method is inspired by the recent advances in deep clustering methods. Unlike already established methods that apply the clustering in a latent embedded space, in our approach the embedding is closely associated with the spatial information, as manifested by the different speakers' directions of arrival.
UR - http://www.scopus.com/inward/record.url?scp=85075607449&partnerID=8YFLogxK
U2 - https://doi.org/10.23919/EUSIPCO.2019.8903121
DO - https://doi.org/10.23919/EUSIPCO.2019.8903121
M3 - منشور من مؤتمر
T3 - European Signal Processing Conference
BT - EUSIPCO 2019 - 27th European Signal Processing Conference
T2 - 27th European Signal Processing Conference, EUSIPCO 2019
Y2 - 2 September 2019 through 6 September 2019
ER -