Abstract
Pitch estimation is an essential task in audio processing due to its key role in many speech and music applications. Still, accurately predicting a continuous value from a high range of pitch frequencies is a challenging task. Inspired by the success of signal processing filterbank methods, we propose a novel deep architecture for accurate pitch estimation. The proposed method is composed of an encoder and multiple decoders. The encoder is implemented by a convolutional neural network that provides a good representation of the raw audio signal, and its output is fed into a set of decoders. Each decoder predicts the pitch value within a specific frequency band and is implemented by a fully-connected neural network. Such a construction allows each decoder to specialize in a particular frequency regime, which turns into a more accurate estimation of pitch values for music and speech signals.
Original language | English |
---|---|
Article number | 9501499 |
Pages (from-to) | 1610-1614 |
Number of pages | 5 |
Journal | IEEE Signal Processing Letters |
Volume | 28 |
DOIs | |
State | Published - 2021 |
Keywords
- Convolutional neural networks
- deep neural networks
- fundamental frequency
- pitch estimation
- speech processing
All Science Journal Classification (ASJC) codes
- Signal Processing
- Applied Mathematics
- Electrical and Electronic Engineering