Abstract
We introduce a user-centric residual-echo suppression (URES) framework in double-talk. This framework receives a user operating point (UOP) that consists of two metric values: the residual echo suppression level (RESL) and the desired speech-maintained level (DSML) that the user expects from the RES outcome. Then, the URES pipeline undergoes three stages. Firstly, we consider a deep RES model with a tunable design parameter that balances between the RESL and DSML and utilizes 101 pre-trained instances of this model, each with a different design parameter value. Thus, an identical input is expected to generate a different pair of RESL and DSML values in the prediction of every instance. Second, every prediction is separately fed to a subsequent pre-trained deep model instance that estimates the RESL and DSML of the prediction since these metrics depend on unavailable information in practice. Lastly, each pair of RESL and DSML estimates is compared with the UOP. The pairs that match the UOP up to a given tolerance threshold are narrowed down to the prediction with the maximal acoustic-echo cancellation mean-opinion score (AECMOS), which is the output of the URES system. This suggested framework holds three prominent advantages introduced in this study: it generates an RES output with RESL and DSML that match a UOP, supports near-real-time tracking of UOP changes, and applies AECMOS maximization. Experimental results consider 60 h of varied real and synthetic data. Average results can achieve an AECMOS subjectively considered excellent with RESL and DSML deviations of roughly 2 dB from the UOP. Any UOP adjustment can be tracked in less than 40 ms with a real-time factor of 1.92, but due to the high computational resources demanded by the framework, this is enabled on-edge only with high-end dedicated hardware, which limits general availability.
Original language | English |
---|---|
Pages (from-to) | 1901-1914 |
Number of pages | 14 |
Journal | IEEE/ACM Transactions on Audio Speech and Language Processing |
Volume | 32 |
DOIs | |
State | Published - 2024 |
Keywords
- AECMOS
- RESL and DSML
- Residual-echo suppression
- deep learning
- double-talk
- user-centric
All Science Journal Classification (ASJC) codes
- Computer Science (miscellaneous)
- Acoustics and Ultrasonics
- Computational Mathematics
- Electrical and Electronic Engineering