Deep Unfolding Transformers for Sparse Recovery of Video

Brent De Weerdt, Yonina C. Eldar, Nikos Deligiannis

Research output: Contribution to journalArticlepeer-review

Abstract

Deep unfolding models are designed by unrolling an optimization algorithm into a deep learning network. By incorporating domain knowledge from the optimization algorithm, they have shown faster convergence and higher performance compared to the original algorithm. We design an optimization problem for sequential signal recovery, which incorporates that the signals have a sparse representation in a dictionary and are correlated over time. A corresponding optimization algorithm is derived and unfolded into a deep unfolding Transformer encoder architecture, coined DUST. To show its improved reconstruction quality and flexibility in handling sequences of different lengths, we perform extensive experiments on video frame reconstruction from low-dimensional and/or noisy measurements, using several video datasets. We evaluate extensions to the base DUST model incorporating token normalization and multi-head attention, and compare our proposed networks with several deep unfolding recurrent neural networks (RNNs), generic unfolded and vanilla Transformers, and several video denoising models. The results show that our proposed Transformer architecture improves the reconstruction quality over state-of-the-art deep unfolding RNNs, existing Transformer networks, as well as state-of-the-art video denoising models, while significantly reducing the model size and computational cost of training and inference.

Original languageEnglish
Pages (from-to)1782-1796
Number of pages15
JournalIEEE Transactions on Signal Processing
Volume72
DOIs
StatePublished - 25 Mar 2024

Keywords

  • Computer architecture
  • Correlation
  • Deep unfolding
  • Image reconstruction
  • Noise reduction
  • Optimization
  • Signal processing algorithms
  • Transformer networks
  • Transformers
  • sparse recovery
  • video compressed sensing
  • video denoising

All Science Journal Classification (ASJC) codes

  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Deep Unfolding Transformers for Sparse Recovery of Video'. Together they form a unique fingerprint.

Cite this