Skip to main navigation Skip to search Skip to main content

Fully Cross-Attention Transformer for Guided Depth Super-Resolution

Research output: Contribution to journalArticlepeer-review

Abstract

Modern depth sensors are often characterized by low spatial resolution, which hinders their use in real-world applications. However, the depth map in many scenarios is accompanied by a corresponding high-resolution color image. In light of this, learning-based methods have been extensively used for guided super-resolution of depth maps. A guided super-resolution scheme uses a corresponding high-resolution color image to infer high-resolution depth maps from low-resolution ones. Unfortunately, these methods still have texture copying problems due to improper guidance from color images. Specifically, in most existing methods, guidance from the color image is achieved by a naive concatenation of color and depth features. In this paper, we propose a fully transformer-based network for depth map super-resolution. A cascaded transformer module extracts deep features from a low-resolution depth. It incorporates a novel cross-attention mechanism to seamlessly and continuously guide the color image into the depth upsampling process. Using a window partitioning scheme, linear complexity in image resolution can be achieved, so it can be applied to high-resolution images. The proposed method of guided depth super-resolution outperforms other state-of-the-art methods through extensive experiments.

Original languageEnglish
Article number2723
JournalSensors
Volume23
Issue number5
DOIs
StatePublished - Mar 2023

Keywords

  • attention
  • deep learning
  • depth maps
  • multimodal
  • super-resolution
  • transformers

ASJC Scopus subject areas

  • Analytical Chemistry
  • Information Systems
  • Biochemistry
  • Atomic and Molecular Physics, and Optics
  • Instrumentation
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Fully Cross-Attention Transformer for Guided Depth Super-Resolution'. Together they form a unique fingerprint.

Cite this