Localization-Guided Supervision for Robust Medical Image Classification by Vision Transformers

Sagi Ben Itzhak, Nahum Kiryati, Orith Portnoy, Arnaldo Mayer

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

A major challenge in developing data-driven algorithms for medical imaging is the limited size of available datasets. Furthermore, these datasets often suffer from inter-site heterogeneity caused by the use of different scanners and scanning protocols. These factors may contribute to overfitting, which undermines the generalization ability and robustness of deep learning classification models in the medical domain, leading to inadequate performance in real-world applications. To address these challenges and mitigate overfitting, we propose a framework which incorporates explanation supervision during training of Vision Transformer (ViT) models for image classification. Our approach leverages foreground masks of the class object during training to regularize attribution maps extracted from ViT, encouraging the model to focus on relevant image regions and make predictions based on pertinent features. We introduce a new method for generating explanatory attribution maps from ViT-based models and construct a dual-loss function that combines a conventional classification loss with a term that regularizes attribution maps. Our approach demonstrates superior performance over existing methods on two challenging medical imaging datasets, highlighting its effectiveness in the medical domain and its potential for application in other fields. Source code is available at: https://github.com/sagibe/LGMViT.

Original languageEnglish
Title of host publicationComputer Vision – ECCV 2024 Workshops, Proceedings
EditorsAlessio Del Bue, Cristian Canton, Jordi Pont-Tuset, Tatiana Tommasi
PublisherSpringer Science and Business Media Deutschland GmbH
Pages118-133
Number of pages16
ISBN (Print)9783031926471
DOIs
StatePublished - 2025
EventWorkshops that were held in conjunction with the 18th European Conference on Computer Vision, ECCV 2024 - Milan, Italy
Duration: 29 Sep 20244 Oct 2024

Publication series

NameLecture Notes in Computer Science
Volume15643 LNCS

Conference

ConferenceWorkshops that were held in conjunction with the 18th European Conference on Computer Vision, ECCV 2024
Country/TerritoryItaly
CityMilan
Period29/09/244/10/24

Keywords

  • Attention
  • Explainability
  • Explainable AI
  • Explanation supervision
  • Image classification
  • Medical imaging
  • Vision Transformer

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Localization-Guided Supervision for Robust Medical Image Classification by Vision Transformers'. Together they form a unique fingerprint.

Cite this