TY - GEN
T1 - Voice activity detection in presence of transient noise using spectral clustering and diffusion kernels
AU - Rosen, Oren
AU - Mousazadeh, Saman
AU - Cohen, Israel
N1 - Publisher Copyright: © Copyright 2015 IEEE All rights reserved.
PY - 2014
Y1 - 2014
N2 - In this paper, we introduce a voice activity detection (VAD) algorithm based on spectral clustering and diffusion kernels. The proposed algorithm is a supervised learning algorithm comprising of learning and testing stages: A sample cloud is produced for every signal frame by utilizing a moving window. Mel-frequency cepstrum coefficients (MFCCs) are then calculated for every sample in the cloud in order to produce an MFCC matrix and subsequently a covariance matrix for every frame. Utilizing the covariance matrix, we calculate a similarity matrix using spectral clustering and diffusion kernels methods. Using the similarity matrix, we cluster the data and transform it to a new space where each point is labeled as speech or nonspeech. We then use a Gaussian Mixture Model (GMM) in order to build a statistical model for labeling data as speech or nonspeech. Simulation results demonstrate its advantages compared to a recent VAD algorithm.
AB - In this paper, we introduce a voice activity detection (VAD) algorithm based on spectral clustering and diffusion kernels. The proposed algorithm is a supervised learning algorithm comprising of learning and testing stages: A sample cloud is produced for every signal frame by utilizing a moving window. Mel-frequency cepstrum coefficients (MFCCs) are then calculated for every sample in the cloud in order to produce an MFCC matrix and subsequently a covariance matrix for every frame. Utilizing the covariance matrix, we calculate a similarity matrix using spectral clustering and diffusion kernels methods. Using the similarity matrix, we cluster the data and transform it to a new space where each point is labeled as speech or nonspeech. We then use a Gaussian Mixture Model (GMM) in order to build a statistical model for labeling data as speech or nonspeech. Simulation results demonstrate its advantages compared to a recent VAD algorithm.
UR - http://www.scopus.com/inward/record.url?scp=84941236701&partnerID=8YFLogxK
U2 - https://doi.org/10.1109/EEEI.2014.7005743
DO - https://doi.org/10.1109/EEEI.2014.7005743
M3 - منشور من مؤتمر
T3 - 2014 IEEE 28th Convention of Electrical and Electronics Engineers in Israel, IEEEI 2014
BT - 2014 IEEE 28th Convention of Electrical and Electronics Engineers in Israel, IEEEI 2014
T2 - 2014 28th IEEE Convention of Electrical and Electronics Engineers in Israel, IEEEI 2014
Y2 - 3 December 2014 through 5 December 2014
ER -