Pitch Estimation by Multiple Octave Decoders

Yael Segal, May Arama-Chayoth, Joseph Keshet

Research output: Contribution to journalArticlepeer-review

Abstract

Pitch estimation is an essential task in audio processing due to its key role in many speech and music applications. Still, accurately predicting a continuous value from a high range of pitch frequencies is a challenging task. Inspired by the success of signal processing filterbank methods, we propose a novel deep architecture for accurate pitch estimation. The proposed method is composed of an encoder and multiple decoders. The encoder is implemented by a convolutional neural network that provides a good representation of the raw audio signal, and its output is fed into a set of decoders. Each decoder predicts the pitch value within a specific frequency band and is implemented by a fully-connected neural network. Such a construction allows each decoder to specialize in a particular frequency regime, which turns into a more accurate estimation of pitch values for music and speech signals.

Original languageEnglish
Article number9501499
Pages (from-to)1610-1614
Number of pages5
JournalIEEE Signal Processing Letters
Volume28
DOIs
StatePublished - 2021

Keywords

  • Convolutional neural networks
  • deep neural networks
  • fundamental frequency
  • pitch estimation
  • speech processing

All Science Journal Classification (ASJC) codes

  • Signal Processing
  • Applied Mathematics
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Pitch Estimation by Multiple Octave Decoders'. Together they form a unique fingerprint.

Cite this