TY - GEN
T1 - Low-Latency Single-Microphone Speaker Separation with Temporal Convolutional Networks Using Speaker Representations
AU - Rubenchik, Boris
AU - Hadad, Elior
AU - Tzirkel, Eli
AU - Fetaya, Ethan
AU - Gannot, Sharon
N1 - Publisher Copyright: © 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - This article introduces a novel method for single-channel speaker separation, focusing on causal and low-latency inference. While significant advancements have been recently made in speaker separation, especially in non-causal scenarios, there is a notable gap concerning low-latency speaker separation - a critical requirement for real-time conversational applications like phone calls, video calls, and human-machine interaction. We propose a two-stage solution that leverages initial-separation in conjunction with speaker representation-driven refinement. To assess the effectiveness of our method, we conducted extensive experiments and examined its performance in both a very low algorithmic latency of 16 milliseconds and a fully causal model. Additionally, we delve into a detailed analysis of the composition of datasets derived from WSJ0, shedding light on their impact on model evaluation. Our two-stage solution demonstrates significant performance enhancements compared to the baseline model and offers an easily deployable solution suitable for edge devices.
AB - This article introduces a novel method for single-channel speaker separation, focusing on causal and low-latency inference. While significant advancements have been recently made in speaker separation, especially in non-causal scenarios, there is a notable gap concerning low-latency speaker separation - a critical requirement for real-time conversational applications like phone calls, video calls, and human-machine interaction. We propose a two-stage solution that leverages initial-separation in conjunction with speaker representation-driven refinement. To assess the effectiveness of our method, we conducted extensive experiments and examined its performance in both a very low algorithmic latency of 16 milliseconds and a fully causal model. Additionally, we delve into a detailed analysis of the composition of datasets derived from WSJ0, shedding light on their impact on model evaluation. Our two-stage solution demonstrates significant performance enhancements compared to the baseline model and offers an easily deployable solution suitable for edge devices.
UR - http://www.scopus.com/inward/record.url?scp=85207236400&partnerID=8YFLogxK
U2 - 10.1109/iwaenc61483.2024.10694508
DO - 10.1109/iwaenc61483.2024.10694508
M3 - منشور من مؤتمر
T3 - 2024 18th International Workshop on Acoustic Signal Enhancement, IWAENC 2024 - Proceedings
SP - 155
EP - 159
BT - 2024 18th International Workshop on Acoustic Signal Enhancement, IWAENC 2024 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 18th International Workshop on Acoustic Signal Enhancement, IWAENC 2024
Y2 - 9 September 2024 through 12 September 2024
ER -