Data Augmenting Contrastive Learning of Speech Representations in the Time Domain

Eugene Kharitonov, Morgane Riviere, Gabriel Synnaeve, Lior Wolf, Pierre Emmanuel Mazare, Matthijs Douze, Emmanuel Dupoux

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Contrastive Predictive Coding (CPC), based on predicting future segments of speech from past segments is emerging as a powerful algorithm for representation learning of speech signal. However, it still under-performs compared to other methods on unsupervised evaluation benchmarks. Here, we intro-duce WavAugment, a time-domain data augmentation library which we adapt and optimize for the specificities of CPC (raw waveform input, contrastive loss, past versus future structure). We find that applying augmentation only to the segments from which the CPC prediction is performed yields better results than applying it also to future segments from which the samples (both positive and negative) of the contrastive loss are drawn. After selecting the best combination of pitch modification, additive noise and reverberation on unsupervised metrics on LibriSpeech (with a gain of 18-22% relative on the ABX score), we apply this combination without any change to three new datasets in the Zero Resource Speech Benchmark 2017 and beat the state-of-the-art using out-of-domain training data. Finally, we show that the data-augmented pretrained features improve a downstream phone recognition task in the Libri-light semi-supervised setting (10 min, 1 h or 10 h of labelled data) reducing the PER by 15% relative.

Original languageEnglish
Title of host publication2021 IEEE Spoken Language Technology Workshop, SLT 2021 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages215-222
Number of pages8
ISBN (Electronic)9781728170664
DOIs
StatePublished - 19 Jan 2021
Externally publishedYes
Event2021 IEEE Spoken Language Technology Workshop, SLT 2021 - Virtual, Shenzhen, China
Duration: 19 Jan 202122 Jan 2021

Publication series

Name2021 IEEE Spoken Language Technology Workshop, SLT 2021 - Proceedings

Conference

Conference2021 IEEE Spoken Language Technology Workshop, SLT 2021
Country/TerritoryChina
CityVirtual, Shenzhen
Period19/01/2122/01/21

Keywords

  • contrastive predictive coding
  • data augmentation
  • unsupervised representation learning

All Science Journal Classification (ASJC) codes

  • Linguistics and Language
  • Language and Linguistics
  • Artificial Intelligence
  • Computer Science Applications
  • Computer Vision and Pattern Recognition
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Data Augmenting Contrastive Learning of Speech Representations in the Time Domain'. Together they form a unique fingerprint.

Cite this