TY - GEN
T1 - ATTENUATION OF ACOUSTIC EARLY REFLECTIONS IN TELEVISION STUDIOS USING PRETRAINED SPEECH SYNTHESIS NEURAL NETWORK
AU - Rosenbaum, Tomer
AU - Cohen, Israel
AU - Winebrand, Emil
N1 - Publisher Copyright: © 2022 IEEE
PY - 2022
Y1 - 2022
N2 - Machine learning and digital signal processing have been extensively used to enhance speech. However, methods to reduce early reflections in studio settings are usually related to the physical characteristics of the room. In this paper, we address the problem of early acoustic reflections in television studios and control rooms, and propose a two-stage method that exploits the knowledge of a pretrained speech synthesis generator. First, given a degraded speech signal that includes the direct sound and early reflections, a U-Net convolutional neural network is used to attenuate the early reflections in the spectral domain. Then, a pretrained speech synthesis generator reconstructs the phase to predict an enhanced speech signal in the time domain. Qualitative and quantitative experimental results demonstrate excellent studio quality of speech enhancement.
AB - Machine learning and digital signal processing have been extensively used to enhance speech. However, methods to reduce early reflections in studio settings are usually related to the physical characteristics of the room. In this paper, we address the problem of early acoustic reflections in television studios and control rooms, and propose a two-stage method that exploits the knowledge of a pretrained speech synthesis generator. First, given a degraded speech signal that includes the direct sound and early reflections, a U-Net convolutional neural network is used to attenuate the early reflections in the spectral domain. Then, a pretrained speech synthesis generator reconstructs the phase to predict an enhanced speech signal in the time domain. Qualitative and quantitative experimental results demonstrate excellent studio quality of speech enhancement.
KW - Acoustic early reflections
KW - generative adversarial networks
KW - speech dereverberation
KW - speech synthesis
UR - http://www.scopus.com/inward/record.url?scp=85131247997&partnerID=8YFLogxK
U2 - https://doi.org/10.1109/ICASSP43922.2022.9746735
DO - https://doi.org/10.1109/ICASSP43922.2022.9746735
M3 - منشور من مؤتمر
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 7422
EP - 7426
BT - 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings
T2 - 47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022
Y2 - 23 May 2022 through 27 May 2022
ER -