TY - GEN
T1 - Differentiable scene graphs
AU - Raboh, Moshiko
AU - Herzig, Roei
AU - Berant, Jonathan
AU - Chechik, Gal
AU - Globerson, Amir
N1 - Publisher Copyright: © 2020 IEEE.
PY - 2020/3
Y1 - 2020/3
N2 - Reasoning about complex visual scenes involves perception of entities and their relations. Scene Graphs (SGs) provide a natural representation for reasoning tasks, by assigning labels to both entities (nodes) and relations (edges). Reasoning systems based on SGs are typically trained in a two-step procedure: first, a model is trained to predict SGs from images, and next a separate model is trained to reason based on the predicted SGs. However, it would seem preferable to train such systems in an end-to-end manner. The challenge, which we address here is that scene-graph representations are non-differentiable and therefore it isn't clear how to use them as intermediate components. Here we propose Differentiable Scene Graphs (DSGs), an image representation that is amenable to differentiable end-to-end optimization, and requires supervision only from the downstream tasks. DSGs provide a dense representation for all regions and pairs of regions, and do not spend modelling capacity on regions of the image that do not contain objects or relations of interest. We evaluate our model on the challenging task of identifying referring relationships (RR) in three benchmark datasets: Visual Genome, VRD and CLEVR. Using DSGs as an intermediate representation leads to new state-of-the-art performance. The full code is available at https://github.com/shikorab/DSG.
AB - Reasoning about complex visual scenes involves perception of entities and their relations. Scene Graphs (SGs) provide a natural representation for reasoning tasks, by assigning labels to both entities (nodes) and relations (edges). Reasoning systems based on SGs are typically trained in a two-step procedure: first, a model is trained to predict SGs from images, and next a separate model is trained to reason based on the predicted SGs. However, it would seem preferable to train such systems in an end-to-end manner. The challenge, which we address here is that scene-graph representations are non-differentiable and therefore it isn't clear how to use them as intermediate components. Here we propose Differentiable Scene Graphs (DSGs), an image representation that is amenable to differentiable end-to-end optimization, and requires supervision only from the downstream tasks. DSGs provide a dense representation for all regions and pairs of regions, and do not spend modelling capacity on regions of the image that do not contain objects or relations of interest. We evaluate our model on the challenging task of identifying referring relationships (RR) in three benchmark datasets: Visual Genome, VRD and CLEVR. Using DSGs as an intermediate representation leads to new state-of-the-art performance. The full code is available at https://github.com/shikorab/DSG.
UR - http://www.scopus.com/inward/record.url?scp=85085494905&partnerID=8YFLogxK
U2 - https://doi.org/10.1109/WACV45572.2020.9093297
DO - https://doi.org/10.1109/WACV45572.2020.9093297
M3 - منشور من مؤتمر
T3 - Proceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020
SP - 1477
EP - 1486
BT - Proceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2020 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2020
Y2 - 1 March 2020 through 5 March 2020
ER -