Abstract
We present a fully Bayesian hierarchical approach for multichannel speech enhancement with time-varying audio channel. Our probabilistic approach relies on a Gaussian prior for the speech signal and a Gamma hyperprior for the speech precision, combined with a multichannel linear-Gaussian state-space model for the acoustic channel. Furthermore, we assume a Wishart prior for the noise precision matrix. We derive a variational expectation-maximization (VEM) algorithm that uses a variant of a multichannel Wiener filter (MCWF) to infer the sound source and a Kalman smoother to infer the acoustic channel. It is further shown that the VEM speech estimator can be recasted as a multichannel minimum variance distortionless response (MVDR) beamformer followed by a single-channel variational postfilter. The proposed algorithm was evaluated using both simulated and real room environments with several noise types and reverberation levels. Both static and dynamic scenarios are considered. In terms of speech quality, it is shown that a significant improvement is obtained with respect to the noisy signal, and that the proposed method outperforms a baseline algorithm. In terms of channel alignment and tracking ability, a superior channel estimate is demonstrated.
Original language | English |
---|---|
Article number | 8492427 |
Pages (from-to) | 225-239 |
Number of pages | 15 |
Journal | IEEE/ACM Transactions on Audio Speech and Language Processing |
Volume | 27 |
Issue number | 1 |
DOIs | |
State | Published - Jan 2019 |
Keywords
- Adaptive beamforming
- Kalman smoother
- variational EM
All Science Journal Classification (ASJC) codes
- Computer Science (miscellaneous)
- Acoustics and Ultrasonics
- Computational Mathematics
- Electrical and Electronic Engineering