Abstract
Deep unfolding models are designed by unrolling an optimization algorithm into a deep learning network. By incorporating domain knowledge from the optimization algorithm, they have shown faster convergence and higher performance compared to the original algorithm. We design an optimization problem for sequential signal recovery, which incorporates that the signals have a sparse representation in a dictionary and are correlated over time. A corresponding optimization algorithm is derived and unfolded into a deep unfolding Transformer encoder architecture, coined DUST. To show its improved reconstruction quality and flexibility in handling sequences of different lengths, we perform extensive experiments on video frame reconstruction from low-dimensional and/or noisy measurements, using several video datasets. We evaluate extensions to the base DUST model incorporating token normalization and multi-head attention, and compare our proposed networks with several deep unfolding recurrent neural networks (RNNs), generic unfolded and vanilla Transformers, and several video denoising models. The results show that our proposed Transformer architecture improves the reconstruction quality over state-of-the-art deep unfolding RNNs, existing Transformer networks, as well as state-of-the-art video denoising models, while significantly reducing the model size and computational cost of training and inference.
Original language | English |
---|---|
Pages (from-to) | 1782-1796 |
Number of pages | 15 |
Journal | IEEE Transactions on Signal Processing |
Volume | 72 |
DOIs | |
State | Published - 25 Mar 2024 |
Keywords
- Computer architecture
- Correlation
- Deep unfolding
- Image reconstruction
- Noise reduction
- Optimization
- Signal processing algorithms
- Transformer networks
- Transformers
- sparse recovery
- video compressed sensing
- video denoising
All Science Journal Classification (ASJC) codes
- Signal Processing
- Electrical and Electronic Engineering