A User-Centric Approach for Deep Residual-Echo Suppression in Double-Talk

Amir Ivry, Israel Cohen, Baruch Berdugo

Research output: Contribution to journalArticlepeer-review

Abstract

We introduce a user-centric residual-echo suppression (URES) framework in double-talk. This framework receives a user operating point (UOP) that consists of two metric values: the residual echo suppression level (RESL) and the desired speech-maintained level (DSML) that the user expects from the RES outcome. Then, the URES pipeline undergoes three stages. Firstly, we consider a deep RES model with a tunable design parameter that balances between the RESL and DSML and utilizes 101 pre-trained instances of this model, each with a different design parameter value. Thus, an identical input is expected to generate a different pair of RESL and DSML values in the prediction of every instance. Second, every prediction is separately fed to a subsequent pre-trained deep model instance that estimates the RESL and DSML of the prediction since these metrics depend on unavailable information in practice. Lastly, each pair of RESL and DSML estimates is compared with the UOP. The pairs that match the UOP up to a given tolerance threshold are narrowed down to the prediction with the maximal acoustic-echo cancellation mean-opinion score (AECMOS), which is the output of the URES system. This suggested framework holds three prominent advantages introduced in this study: it generates an RES output with RESL and DSML that match a UOP, supports near-real-time tracking of UOP changes, and applies AECMOS maximization. Experimental results consider 60 h of varied real and synthetic data. Average results can achieve an AECMOS subjectively considered excellent with RESL and DSML deviations of roughly 2 dB from the UOP. Any UOP adjustment can be tracked in less than 40 ms with a real-time factor of 1.92, but due to the high computational resources demanded by the framework, this is enabled on-edge only with high-end dedicated hardware, which limits general availability.

Original languageEnglish
Pages (from-to)1901-1914
Number of pages14
JournalIEEE/ACM Transactions on Audio Speech and Language Processing
Volume32
DOIs
StatePublished - 2024

Keywords

  • AECMOS
  • RESL and DSML
  • Residual-echo suppression
  • deep learning
  • double-talk
  • user-centric

All Science Journal Classification (ASJC) codes

  • Computer Science (miscellaneous)
  • Acoustics and Ultrasonics
  • Computational Mathematics
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'A User-Centric Approach for Deep Residual-Echo Suppression in Double-Talk'. Together they form a unique fingerprint.

Cite this