TY - GEN
T1 - Scene-agnostic multi-microphone speech dereverberation
AU - Yemini, Yochai
AU - Fetaya, Ethan
AU - Maron, Haggai
AU - Gannot, Sharon
N1 - Publisher Copyright: Copyright © 2021 ISCA.
PY - 2021
Y1 - 2021
N2 - Neural networks (NNs) have been widely applied in speech processing tasks, and, in particular, those employing microphone arrays. Nevertheless, most existing NN architectures can only deal with fixed and position-specific microphone arrays. In this paper, we present an NN architecture that can cope with microphone arrays whose number and positions of the microphones are unknown, and demonstrate its applicability in the speech dereverberation task. To this end, our approach harnesses recent advances in deep learning on set-structured data to design an architecture that enhances the reverberant log-spectrum. We use noisy and noiseless versions of a simulated reverberant dataset to test the proposed architecture. Our experiments on the noisy data show that the proposed scene-agnostic setup outperforms a powerful scene-aware framework, sometimes even with fewer microphones. With the noiseless dataset we show that, in most cases, our method outperforms the position-aware network as well as the state-of-the-art weighted linear prediction error (WPE) algorithm.
AB - Neural networks (NNs) have been widely applied in speech processing tasks, and, in particular, those employing microphone arrays. Nevertheless, most existing NN architectures can only deal with fixed and position-specific microphone arrays. In this paper, we present an NN architecture that can cope with microphone arrays whose number and positions of the microphones are unknown, and demonstrate its applicability in the speech dereverberation task. To this end, our approach harnesses recent advances in deep learning on set-structured data to design an architecture that enhances the reverberant log-spectrum. We use noisy and noiseless versions of a simulated reverberant dataset to test the proposed architecture. Our experiments on the noisy data show that the proposed scene-agnostic setup outperforms a powerful scene-aware framework, sometimes even with fewer microphones. With the noiseless dataset we show that, in most cases, our method outperforms the position-aware network as well as the state-of-the-art weighted linear prediction error (WPE) algorithm.
KW - Deep neural network
KW - Deep sets
KW - Microphone array
KW - Speech dereverberation
UR - http://www.scopus.com/inward/record.url?scp=85119212098&partnerID=8YFLogxK
U2 - 10.21437/interspeech.2021-889
DO - 10.21437/interspeech.2021-889
M3 - منشور من مؤتمر
T3 - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
SP - 2453
EP - 2457
BT - 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
T2 - 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
Y2 - 30 August 2021 through 3 September 2021
ER -