TY - GEN
T1 - Hide and Speak
T2 - 21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020
AU - Kreuk, Felix
AU - Adi, Yossi
AU - Raj, Bhiksha
AU - Singh, Rita
AU - Keshet, Joseph
N1 - Publisher Copyright: © 2020 ISCA
PY - 2020
Y1 - 2020
N2 - Steganography is the science of hiding a secret message within an ordinary public message, which is referred to as Carrier. Traditionally, digital signal processing techniques, such as least significant bit encoding, were used for hiding messages. In this paper, we explore the use of deep neural networks as steganographic functions for speech data. We showed that steganography models proposed for vision are less suitable for speech, and propose a new model that includes the short-time Fourier transform and inverse-short-time Fourier transform as differentiable layers within the network, thus imposing a vital constraint on the network outputs. We empirically demonstrated the effectiveness of the proposed method comparing to deep learning based on several speech datasets and analyzed the results quantitatively and qualitatively. Moreover, we showed that the proposed approach could be applied to conceal multiple messages in a single carrier using multiple decoders or a single conditional decoder. Lastly, we evaluated our model under different channel distortions. Qualitative experiments suggest that modifications to the carrier are unnoticeable by human listeners and that the decoded messages are highly intelligible.
AB - Steganography is the science of hiding a secret message within an ordinary public message, which is referred to as Carrier. Traditionally, digital signal processing techniques, such as least significant bit encoding, were used for hiding messages. In this paper, we explore the use of deep neural networks as steganographic functions for speech data. We showed that steganography models proposed for vision are less suitable for speech, and propose a new model that includes the short-time Fourier transform and inverse-short-time Fourier transform as differentiable layers within the network, thus imposing a vital constraint on the network outputs. We empirically demonstrated the effectiveness of the proposed method comparing to deep learning based on several speech datasets and analyzed the results quantitatively and qualitatively. Moreover, we showed that the proposed approach could be applied to conceal multiple messages in a single carrier using multiple decoders or a single conditional decoder. Lastly, we evaluated our model under different channel distortions. Qualitative experiments suggest that modifications to the carrier are unnoticeable by human listeners and that the decoded messages are highly intelligible.
UR - http://www.scopus.com/inward/record.url?scp=85098190335&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2020-2380
DO - 10.21437/Interspeech.2020-2380
M3 - منشور من مؤتمر
SN - 9781713820697
T3 - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
SP - 4656
EP - 4660
BT - Interspeech 2020
Y2 - 25 October 2020 through 29 October 2020
ER -