Abstract
Speech enhancement and dereverberation approaches based on neural networks are designed to learn a transformation from noisy to clean speech using supervised learning. However, networks trained in this way may fail to effectively handle languages, types of noise, or acoustic environments that were not included in the training data. To tackle this issue, the present study centers around unsupervised domain adaptation, specifically addressing scenarios characterized by substantial domain gaps. In this scenario, we have noisy speech data from the new domain, but the corresponding clean speech data is unavailable. We propose an adaptation method based on domain-adversarial training followed by iterative self-training, where the estimated speech is used as pseudo labels, and the target samples are gradually introduced to the network based on their similarity to the source domain. The self-training also utilizes labeled samples from the source domain which are similar to the target domain. The experimental results show that our method effectively mitigates the domain mismatch between the training and test sets, thus outperforming the current baselines.
Original language | English |
---|---|
Pages (from-to) | 1226-1236 |
Number of pages | 11 |
Journal | IEEE/ACM Transactions on Audio Speech and Language Processing |
Volume | 32 |
DOIs | |
State | Published - 2024 |
Keywords
- Unsupervised domain adaptation
- dereverberation
- pseudo labels
- self-training
- speech enhancement
All Science Journal Classification (ASJC) codes
- Computer Science (miscellaneous)
- Computational Mathematics
- Electrical and Electronic Engineering
- Acoustics and Ultrasonics