Speech Emotion Recognition Using Auditory Spectrogram and Cepstral Features

Shujie Zhao, Yan Yang, Israel Cohen, Lijun Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review


A systematic comparison on the impact of environmental noises on key acoustic features is critical in order to transfer speech emotion recognition (SER) systems into real world applications. In this study, we investigate the noise-tolerance of different acoustic features in distinguishing various emotions by comparing the SER classification performance on clean speech signals and noisy speech signals. We extract the spectrum and cepstral parameters based on human auditory characteristics and develop machine learning algorithms to classify four types of emotions using these features. Experimental results across the clean and noisy data show that compared to cepstral features, the auditory spectrogram-based features can achieve higher recognition accuracy for low signal-to-noise ratios (SNRs), but lower accuracy for high SNRs. Gammatone filter cepstral coefficients (GFCCs) outperformed all the extracted features on the Berlin database of emotional speech (EmoDB), under all four kinds of tested noise conditions. These results show compensation relationships between auditory spectrogram-based features and cepstral features for SER with better noise robustness in real-world applications.

Original languageEnglish
Title of host publication29th European Signal Processing Conference, EUSIPCO 2021 - Proceedings
Number of pages5
ISBN (Electronic)9789082797060
StatePublished - 2021
Event29th European Signal Processing Conference, EUSIPCO 2021 - Dublin, Ireland
Duration: 23 Aug 202127 Aug 2021

Publication series

NameEuropean Signal Processing Conference


Conference29th European Signal Processing Conference, EUSIPCO 2021


  • Emotion recognition
  • Feature extraction
  • Machine learning
  • Noise
  • Pattern recognition
  • Speech signals

All Science Journal Classification (ASJC) codes

  • Signal Processing
  • Electrical and Electronic Engineering


Dive into the research topics of 'Speech Emotion Recognition Using Auditory Spectrogram and Cepstral Features'. Together they form a unique fingerprint.

Cite this