Sketch-Guided Text-to-Image Diffusion Models

Andrey Voynov, Kfir Aberman, Daniel Cohen-Or

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Text-to-Image models have introduced a remarkable leap in the evolution of machine learning, demonstrating high-quality synthesis of images from a given text-prompt. However, these powerful pretrained models still lack control handles that can guide spatial properties of the synthesized images. In this work, we introduce a universal approach to guide a pretrained text-to-image diffusion model, with a spatial map from another domain (e.g., sketch) during inference time. Unlike previous works, our method does not require to train a dedicated model or a specialized encoder for the task. Our key idea is to train a Latent Guidance Predictor (LGP) - a small, per-pixel, Multi-Layer Perceptron (MLP) that maps latent features of noisy images to spatial maps, where the deep features are extracted from the core Denoising Diffusion Probabilistic Model (DDPM) network. The LGP is trained only on a few thousand images and constitutes a differential guiding map predictor, over which the loss is computed and propagated back to push the intermediate images to agree with the spatial map. The per-pixel training offers flexibility and locality which allows the technique to perform well on out-of-domain sketches, including free-hand style drawings. We take a particular focus on the sketch-to-image translation task, revealing a robust and expressive way to generate images that follow the guidance of a sketch of arbitrary style or domain.

Original languageEnglish
Title of host publicationProceedings - SIGGRAPH 2023 Conference Papers
EditorsStephen N. Spencer
ISBN (Electronic)9798400701597
DOIs
StatePublished - 23 Jul 2023
Event2023 Special Interest Group on Computer Graphics and Interactive Techniques Conference, SIGGRAPH 2023 - Los Angeles, United States
Duration: 6 Aug 202310 Aug 2023

Publication series

NameProceedings - SIGGRAPH 2023 Conference Papers

Conference

Conference2023 Special Interest Group on Computer Graphics and Interactive Techniques Conference, SIGGRAPH 2023
Country/TerritoryUnited States
CityLos Angeles
Period6/08/2310/08/23

Keywords

  • diffusion models
  • image translation

All Science Journal Classification (ASJC) codes

  • Software
  • Computer Vision and Pattern Recognition
  • Computer Graphics and Computer-Aided Design

Fingerprint

Dive into the research topics of 'Sketch-Guided Text-to-Image Diffusion Models'. Together they form a unique fingerprint.

Cite this