TY - GEN
T1 - Breaking Common Sense
T2 - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023
AU - Bitton-Guetta, Nitzan
AU - Bitton, Yonatan
AU - Hessel, Jack
AU - Schmidt, Ludwig
AU - Elovici, Yuval
AU - Stanovsky, Gabriel
AU - Schwartz, Roy
N1 - Publisher Copyright: © 2023 IEEE.
PY - 2023/1/1
Y1 - 2023/1/1
N2 - Weird, unusual, and uncanny images pique the curiosity of observers because they challenge commonsense. For example, an image released during the 2022 world cup depicts the famous soccer stars Lionel Messi and Cristiano Ronaldo playing chess, which playfully violates our expectation that their competition should occur on the football field.1 Humans can easily recognize and interpret these unconventional images, but can AI models do the same? We introduce WHOOPS!, a new dataset and benchmark for visual commonsense. The dataset is comprised of purposefully commonsense-defying images created by designers using publicly-available image generation tools like Midjourney. We consider several tasks posed over the dataset. In addition to image captioning, cross-modal matching, and visual question answering, we introduce a difficult explanation generation task, where models must identify and explain why a given image is unusual. Our results show that state-of-the-art models such as GPT3 and BLIP2 still lag behind human performance on WHOOPS!. We hope our dataset will inspire the development of AI models with stronger visual commonsense reasoning abilities.
AB - Weird, unusual, and uncanny images pique the curiosity of observers because they challenge commonsense. For example, an image released during the 2022 world cup depicts the famous soccer stars Lionel Messi and Cristiano Ronaldo playing chess, which playfully violates our expectation that their competition should occur on the football field.1 Humans can easily recognize and interpret these unconventional images, but can AI models do the same? We introduce WHOOPS!, a new dataset and benchmark for visual commonsense. The dataset is comprised of purposefully commonsense-defying images created by designers using publicly-available image generation tools like Midjourney. We consider several tasks posed over the dataset. In addition to image captioning, cross-modal matching, and visual question answering, we introduce a difficult explanation generation task, where models must identify and explain why a given image is unusual. Our results show that state-of-the-art models such as GPT3 and BLIP2 still lag behind human performance on WHOOPS!. We hope our dataset will inspire the development of AI models with stronger visual commonsense reasoning abilities.
UR - http://www.scopus.com/inward/record.url?scp=85179078176&partnerID=8YFLogxK
U2 - https://doi.org/10.1109/ICCV51070.2023.00247
DO - https://doi.org/10.1109/ICCV51070.2023.00247
M3 - Conference contribution
T3 - Proceedings of the IEEE International Conference on Computer Vision
SP - 2616
EP - 2627
BT - Proceedings - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023
Y2 - 2 October 2023 through 6 October 2023
ER -