TY - GEN
T1 - Image-Based CLIP-Guided Essence Transfer
AU - Chefer, Hila
AU - Benaim, Sagie
AU - Paiss, Roni
AU - Wolf, Lior
N1 - Publisher Copyright: © 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - We make the distinction between (i) style transfer, in which a source image is manipulated to match the textures and colors of a target image, and (ii) essence transfer, in which one edits the source image to include high-level semantic attributes from the target. Crucially, the semantic attributes that constitute the essence of an image may differ from image to image. Our blending operator combines the powerful StyleGAN generator and the semantic encoder of CLIP in a novel way that is simultaneously additive in both latent spaces, resulting in a mechanism that guarantees both identity preservation and high-level feature transfer without relying on a facial recognition network. We present two variants of our method. The first is based on optimization, while the second fine-tunes an existing inversion encoder to perform essence extraction. Through extensive experiments, we demonstrate the superiority of our methods for essence transfer over existing methods for style transfer, domain adaptation, and text-based semantic editing. Our code is available at: https://github.com/hila-chefer/TargetCLIP.
AB - We make the distinction between (i) style transfer, in which a source image is manipulated to match the textures and colors of a target image, and (ii) essence transfer, in which one edits the source image to include high-level semantic attributes from the target. Crucially, the semantic attributes that constitute the essence of an image may differ from image to image. Our blending operator combines the powerful StyleGAN generator and the semantic encoder of CLIP in a novel way that is simultaneously additive in both latent spaces, resulting in a mechanism that guarantees both identity preservation and high-level feature transfer without relying on a facial recognition network. We present two variants of our method. The first is based on optimization, while the second fine-tunes an existing inversion encoder to perform essence extraction. Through extensive experiments, we demonstrate the superiority of our methods for essence transfer over existing methods for style transfer, domain adaptation, and text-based semantic editing. Our code is available at: https://github.com/hila-chefer/TargetCLIP.
UR - http://www.scopus.com/inward/record.url?scp=85142727502&partnerID=8YFLogxK
U2 - https://doi.org/10.1007/978-3-031-19778-9_40
DO - https://doi.org/10.1007/978-3-031-19778-9_40
M3 - منشور من مؤتمر
SN - 9783031197772
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 695
EP - 711
BT - Computer Vision – ECCV 2022 - 17th European Conference, 2022, Proceedings
A2 - Avidan, Shai
A2 - Brostow, Gabriel
A2 - Cissé, Moustapha
A2 - Farinella, Giovanni Maria
A2 - Hassner, Tal
PB - Springer Science and Business Media Deutschland GmbH
T2 - 17th European Conference on Computer Vision, ECCV 2022
Y2 - 23 October 2022 through 27 October 2022
ER -