Abstract
Distortionless speech extraction in a reverberant environment can be achieved by applying a beamforming algorithm, provided that the relative transfer functions (RTFs) of the sources and the covariance matrix of the noise are known. In this paper, the challenge of RTF identification in a multi-speaker scenario is addressed. We propose a successive RTF identification (SRI) technique, based on the sole assumption that sources do not become simultaneously active. That is, we address the challenge of estimating the RTF of a specific speech source while assuming that the RTFs of all other active sources in the environment were previously estimated in an earlier stage. The RTF of interest is identified by applying the blind oblique projection (BOP)-SRI technique. When a new speech source is identified, the BOP algorithm is applied. BOP results in a null steering toward the RTF of interest, by means of applying an oblique projection to the microphone measurements. We prove that by artificially increasing the rank of the range of the projection matrix, the RTF of interest can be identified. An experimental study is carried out to evaluate the performance of the BOP-SRI algorithm in various signal to noise ratio (SNR) and signal to interference ratio (SIR) conditions and to demonstrate its effectiveness in speech extraction tasks.
| Original language | English |
|---|---|
| Article number | 8926399 |
| Pages (from-to) | 474-486 |
| Number of pages | 13 |
| Journal | IEEE/ACM Transactions on Audio Speech and Language Processing |
| Volume | 28 |
| DOIs | |
| State | Published - 2020 |
Keywords
- Oblique projection
- Relative transfer function
- System identification
All Science Journal Classification (ASJC) codes
- Computer Science (miscellaneous)
- Acoustics and Ultrasonics
- Computational Mathematics
- Electrical and Electronic Engineering