TY - GEN
T1 - Spatially-adaptive pixelwise networks for fast image translation
AU - Rott Shaham, Tamar
AU - Gharbi, Michaël
AU - Zhang, Richard
AU - Shechtman, Eli
AU - Michaeli, Tomer
N1 - Publisher Copyright: © 2021 IEEE
PY - 2021
Y1 - 2021
N2 - We introduce a new generator architecture, aimed at fast and efficient high-resolution image-to-image translation. We design the generator to be an extremely lightweight function of the full-resolution image. In fact, we use pixel-wise networks; that is, each pixel is processed independently of others, through a composition of simple affine transformations and nonlinearities. We take three important steps to equip such a seemingly simple function with adequate ex-pressivity. First, the parameters of the pixel-wise networks are spatially varying, so they can represent a broader function class than simple 1 × 1 convolutions. Second, these parameters are predicted by a fast convolutional network that processes an aggressively low-resolution representation of the input. Third, we augment the input image by concatenating a sinusoidal encoding of spatial coordinates, which provides an effective inductive bias for generating realistic novel high-frequency image content. As a result, our model is up to 18× faster than state-of-the-art baselines. We achieve this speedup while generating comparable visual quality across different image resolutions and translation domains.
AB - We introduce a new generator architecture, aimed at fast and efficient high-resolution image-to-image translation. We design the generator to be an extremely lightweight function of the full-resolution image. In fact, we use pixel-wise networks; that is, each pixel is processed independently of others, through a composition of simple affine transformations and nonlinearities. We take three important steps to equip such a seemingly simple function with adequate ex-pressivity. First, the parameters of the pixel-wise networks are spatially varying, so they can represent a broader function class than simple 1 × 1 convolutions. Second, these parameters are predicted by a fast convolutional network that processes an aggressively low-resolution representation of the input. Third, we augment the input image by concatenating a sinusoidal encoding of spatial coordinates, which provides an effective inductive bias for generating realistic novel high-frequency image content. As a result, our model is up to 18× faster than state-of-the-art baselines. We achieve this speedup while generating comparable visual quality across different image resolutions and translation domains.
UR - http://www.scopus.com/inward/record.url?scp=85118628605&partnerID=8YFLogxK
U2 - 10.1109/CVPR46437.2021.01464
DO - 10.1109/CVPR46437.2021.01464
M3 - منشور من مؤتمر
T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
SP - 14877
EP - 14886
BT - Proceedings - 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021
T2 - 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021
Y2 - 19 June 2021 through 25 June 2021
ER -