Many-speakers single channel speech separation with optimal permutation training

Shaked Dovrat, Eliya Nachmani, Lior Wolf

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Single channel speech separation has experienced great progress in the last few years. However, training neural speech separation for a large number of speakers (e.g., more than 10 speakers) is out of reach for the current methods, which rely on the Permutation Invariant Training (PIT). In this work, we present a permutation invariant training that employs the Hungarian algorithm in order to train with an O(C3) time complexity, where C is the number of speakers, in comparison to O(C!) of PIT based methods. Furthermore, we present a modified architecture that can handle the increased number of speakers. Our approach separates up to 20 speakers and improves the previous results for large C by a wide margin.

Original languageAmerican English
Title of host publication22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
Pages2408-2412
Number of pages5
ISBN (Electronic)9781713836902
DOIs
StatePublished - 1 Jan 2021
Event22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021 - Brno, Czech Republic
Duration: 30 Aug 20213 Sep 2021

Publication series

NameProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume4

Conference

Conference22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
Country/TerritoryCzech Republic
CityBrno
Period30/08/213/09/21

Keywords

  • Deep learning
  • Single channel
  • Speech separation

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Cite this