TY - GEN
T1 - Exploiting the intermittency of speech for joint separation and diarization
AU - Kounades-Bastian, Dionyssos
AU - Girin, Laurent
AU - Alameda-Pineda, Xavier
AU - Horaud, Radu
AU - Gannot, Sharon
N1 - Publisher Copyright: © 2017 IEEE.
PY - 2017/12/7
Y1 - 2017/12/7
N2 - Natural conversations are spontaneous exchanges involving two or more people speaking in an intermittent manner. Therefore one expects such conversation to have intervals where some of the speakers are silent. Yet, most (multichannel) audio source separation (MASS) methods consider the sound sources to be continuously emitting on the total duration of the processed mixture. In this paper we propose a probabilistic model for MASS where the sources may have pauses. The activity of the sources is modeled as a hidden state, the diarization state, enabling us to activate/de-Activate the sound sources at time frame resolution. We plug the diarization model within the spatial covariance matrix model proposed for MASS in [1], and obtain an improvement in performance over the state of the art when separating mixtures with intermittent speakers.
AB - Natural conversations are spontaneous exchanges involving two or more people speaking in an intermittent manner. Therefore one expects such conversation to have intervals where some of the speakers are silent. Yet, most (multichannel) audio source separation (MASS) methods consider the sound sources to be continuously emitting on the total duration of the processed mixture. In this paper we propose a probabilistic model for MASS where the sources may have pauses. The activity of the sources is modeled as a hidden state, the diarization state, enabling us to activate/de-Activate the sound sources at time frame resolution. We plug the diarization model within the spatial covariance matrix model proposed for MASS in [1], and obtain an improvement in performance over the state of the art when separating mixtures with intermittent speakers.
KW - Audio source separation
KW - EM
KW - spatial covariance matrix
KW - speaker diarization
UR - http://www.scopus.com/inward/record.url?scp=85042388426&partnerID=8YFLogxK
U2 - https://doi.org/10.1109/WASPAA.2017.8169991
DO - https://doi.org/10.1109/WASPAA.2017.8169991
M3 - منشور من مؤتمر
T3 - IEEE Workshop on Applications of Signal Processing to Audio and Acoustics
SP - 41
EP - 45
BT - 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2017
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2017
Y2 - 15 October 2017 through 18 October 2017
ER -