TY - GEN
T1 - Speaker diarization during noisy clinical diagnoses of autism
AU - Gorodetski, Alex
AU - Dinstein, Ilan
AU - Zigel, Yaniv
N1 - Publisher Copyright: © 2019 IEEE.
PY - 2019/7/1
Y1 - 2019/7/1
N2 - Autism Spectrum Disorder (ASD) is characterized by difficulties in social communication, social interactions and repetitive behaviors. Some of these difficulties are apparent in the speech characteristics of ASD children who are verbal. Developing algorithms that can extract and quantify speech features that are unique to ASD children is, therefore, extremely valuable for assessing the initial state of each child and their development over time. An important component of such algorithms is speaker diarization in the noisy clinical environments where ASD children are diagnosed. Here we present a Gaussian Mixture Model (GMM) approach for speaker diarization that was applied to 34 recordings from clinical assessments using the Autism Diagnostic Observation Schedule (ADOS). We used mel-frequency cepstral coefficients (MFCC) and pitch based features to classify segments containing speech of the child, therapist, parent, movement noises (chair, toys, etc.) and simultaneous speech. We achieved an accuracy of 89% in identifying segments with children's speech and an accuracy of 74.5% in identifying children's and therapists' speech segments. These accuracy rates are similar to the diarization accuracy rates reported by previous similar studies, thereby demonstrating a promising route for the automated assessment of speech in children with ASD.
AB - Autism Spectrum Disorder (ASD) is characterized by difficulties in social communication, social interactions and repetitive behaviors. Some of these difficulties are apparent in the speech characteristics of ASD children who are verbal. Developing algorithms that can extract and quantify speech features that are unique to ASD children is, therefore, extremely valuable for assessing the initial state of each child and their development over time. An important component of such algorithms is speaker diarization in the noisy clinical environments where ASD children are diagnosed. Here we present a Gaussian Mixture Model (GMM) approach for speaker diarization that was applied to 34 recordings from clinical assessments using the Autism Diagnostic Observation Schedule (ADOS). We used mel-frequency cepstral coefficients (MFCC) and pitch based features to classify segments containing speech of the child, therapist, parent, movement noises (chair, toys, etc.) and simultaneous speech. We achieved an accuracy of 89% in identifying segments with children's speech and an accuracy of 74.5% in identifying children's and therapists' speech segments. These accuracy rates are similar to the diarization accuracy rates reported by previous similar studies, thereby demonstrating a promising route for the automated assessment of speech in children with ASD.
KW - ADOS
KW - ASD detection
KW - Autism
KW - Child speech processing
KW - Speaker diarization
KW - Speech processing
KW - Viterbi algorithm
UR - http://www.scopus.com/inward/record.url?scp=85077841005&partnerID=8YFLogxK
U2 - 10.1109/EMBC.2019.8857247
DO - 10.1109/EMBC.2019.8857247
M3 - Conference contribution
C2 - 31946427
T3 - Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS
SP - 2593
EP - 2596
BT - 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBC 2019
T2 - 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBC 2019
Y2 - 23 July 2019 through 27 July 2019
ER -