Abstract
Speaker Change Detection (SCD) is the task of segmenting an input audio-recording according to speaker interchanges. Nowadays, many applications, such as Speaker Diarization (SD) or automatic vocal transcription, depend on this segmentation task. In this paper, we focus on the essential task of the SD problem, the audio segmenting process, and suggest a solution for the SCD problem, as well as the assignment of clustered speaker labels for the extracted segments, and applying the solution over two datasets: a commercial dataset in Hebrew and the ICSI Meeting Corpus. As such, we propose a hybrid framework for the SCD problem that is learned by textual information and speech signals and the meta-data features that can be extracted from them. Moreover, we demonstrate the negative correlation between an increase in the number of speakers in the training dataset and the influence on the overall diarization system's performance, which is improved using our efficient SCD component. Finally, we show how our proposed hybrid framework remains robust compared to the ICSI Meeting Corpus, as the experimental evaluation's training and testing is based on two languages.
Original language | American English |
---|---|
Article number | 9468954 |
Pages (from-to) | 2324-2338 |
Number of pages | 15 |
Journal | IEEE/ACM Transactions on Audio Speech and Language Processing |
Volume | 29 |
DOIs | |
State | Published - 1 Jan 2021 |
Keywords
- D-vectors
- Speaker change detection
- speaker diarization
- speaker verification
- speech analysis
All Science Journal Classification (ASJC) codes
- Computer Science (miscellaneous)
- Acoustics and Ultrasonics
- Computational Mathematics
- Electrical and Electronic Engineering