Abstract
This paper presents a model for sound classification in construction that leverages a unique combination of Mel spectrograms and Mel-Frequency Cepstral Coefficient (MFCC) values. This model combines deep neural networks like Convolution Neural Networks (CNN) and Long short-term memory (LSTM) to create CNN-LSTM and MFCCs-LSTM architectures, enabling the extraction of spectral and temporal features from audio data. The audio data, generated from construction activities in a real-time closed environment is used to evaluate the proposed model and resulted in an overall Precision, Recall, and F1-score of 91%, 89%, and 91%, respectively. This performance surpasses other established models, including Deep Neural Networks (DNN), CNN, and Recurrent Neural Networks (RNN), as well as a combination of these models as CNN-DNN, CNN-RNN, and CNN-LSTM. These results underscore the potential of combining Mel spectrograms and MFCC values to provide a more informative representation of sound data, thereby enhancing sound classification in noisy environments.
Original language | English |
---|---|
Article number | 105485 |
Journal | Automation in Construction |
Volume | 165 |
DOIs | |
State | Published - Sep 2024 |
Keywords
- Activity tracking
- Audio
- CNN
- LSTM
- Mel spectrograms
- MFCC
- Sound
All Science Journal Classification (ASJC) codes
- Control and Systems Engineering
- Civil and Structural Engineering
- Building and Construction