Abstract
In this work, we compared two different input approaches to estimate autism severity using speech signals. We analyzed 127 audio recordings of young children obtained during the Autism Diagnostic Observation Schedule 2nd edition (ADOS-2) administration. Two different sets of features were extracted from each recording: 1) hand-crafted features, which included acoustic and prosodic features, and 2) log-mel spectrograms, which give the time-frequency representation. We examined two different Convolutional Neural Network (CNN) architectures for each of the two inputs and compared the autism severity estimation performance. We showed that the hand-crafted features yielded lower prediction error (normalized RMSE) in most examined configurations than the log-mel spectrograms. Moreover, fusing the estimated autism severity scores of the two feature extraction methods yielded the best results, where both architectures exhibited similar performance (Pearson R=0.66, normalized RMSE=0.24).
| Original language | American English |
|---|---|
| Pages (from-to) | 4154-4158 |
| Number of pages | 5 |
| Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
| Volume | 2023-August |
| DOIs | |
| State | Published - 1 Jan 2023 |
| Event | 24th International Speech Communication Association, Interspeech 2023 - Dublin, Ireland Duration: 20 Aug 2023 → 24 Aug 2023 |
Keywords
- ADOS
- CNN
- audio
- autism
- features
- severity estimation
- spectrogram
All Science Journal Classification (ASJC) codes
- Language and Linguistics
- Human-Computer Interaction
- Signal Processing
- Software
- Modelling and Simulation
Fingerprint
Dive into the research topics of 'Comparing hand-crafted features to spectrograms for autism severity estimation'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver