Anchored Diffusion for Video Face Reenactment

Idan Kligvasser, Regev Cohen, George Leifman, Ehud Rivlin, Michael Elad

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Video generation has drawn significant interest recently, pushing the development of large-scale models capable of producing realistic videos with coherent motion. Due to memory constraints, these models typically generate short video segments that are then combined into long videos. The merging process poses a significant challenge, as it requires ensuring smooth transitions and overall consistency. In this paper, we introduce Anchored Diffusion, a novel method for synthesizing relatively long and seamless videos. We extend Diffusion Transformers (DiTs) to incorporate temporal information, creating our sequence-DiT (sDiT) model for generating short video segments. Unlike previous works, we train our model on video sequences with random non-uniform temporal spacing and incorporate temporal information via external guidance, increasing flexibility and allowing it to capture both short and long-term relationships. Furthermore, during inference, we leverage the transformer architecture to modify the diffusion process, generating a batch of non-uniform sequences anchored to a common frame, ensuring consistency regardless of temporal distance. To demonstrate our method, we focus on face reenactment, a task of transforming the action from the driving video to the source face. Through comprehensive experiments, we show our approach outperforms current techniques in producing longer consistent high-quality videos while offering editing capabilities.

Original languageEnglish
Title of host publicationProceedings - 2025 IEEE Winter Conference on Applications of Computer Vision, WACV 2025
Pages4087-4097
Number of pages11
ISBN (Electronic)9798331510831
DOIs
StatePublished - 2025
Externally publishedYes
Event2025 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2025 - Tucson, United States
Duration: 28 Feb 20254 Mar 2025

Publication series

NameProceedings - 2025 IEEE Winter Conference on Applications of Computer Vision, WACV 2025

Conference

Conference2025 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2025
Country/TerritoryUnited States
CityTucson
Period28/02/254/03/25

Keywords

  • anchored diffusion
  • diffusion process
  • diffusion transformers
  • face reenactment
  • generative ai
  • high-quality videos
  • sequence-dit
  • temporal information
  • transformer architecture
  • video consistency
  • video editing
  • video generation
  • video synthesis

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computer Science Applications
  • Computer Vision and Pattern Recognition
  • Human-Computer Interaction
  • Modelling and Simulation
  • Radiology Nuclear Medicine and imaging

Fingerprint

Dive into the research topics of 'Anchored Diffusion for Video Face Reenactment'. Together they form a unique fingerprint.

Cite this