Similarity analysis of self-supervised speech representations

Yu An Chung, Yonatan Belinkov, James Glass

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Self-supervised speech representation learning has recently been a prosperous research topic. Many algorithms have been proposed for learning useful representations from large-scale unlabeled data, and their applications to a wide range of speech tasks have also been investigated. However, there has been little research focusing on understanding the properties of existing approaches. In this work, we aim to provide a comparative study of some of the most representative self-supervised algorithms. Specifically, we quantify the similarities between different self-supervised representations using existing similarity measures. We also design probing tasks to study the correlation between the models’ pretraining loss and the amount of specific speech information contained in their learned representations. In addition to showing how various self-supervised models behave differently given the same input, our study also finds that the training objective has a higher impact on representation similarity than architectural choices such as building blocks (RNN/Transformer/CNN) and directionality (uni/bidirectional). Our results also suggest that there exists a strong correlation between pre-training loss and downstream performance for some self-supervised algorithms.

Original languageEnglish
Title of host publicationICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Pages3040-3044
Number of pages5
Volume2021-June
DOIs
StatePublished - 2021
Externally publishedYes
Event2021 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2021 - Virtual, Toronto, Canada
Duration: 6 Jun 202111 Jun 2021

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.

Conference

Conference2021 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2021
Country/TerritoryCanada
CityVirtual, Toronto
Period6/06/2111/06/21

Keywords

  • Comparative analysis
  • Self-supervised learning
  • Speech representation learning
  • Unsupervised pre-training

All Science Journal Classification (ASJC) codes

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Similarity analysis of self-supervised speech representations'. Together they form a unique fingerprint.

Cite this