Publication
Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?
Antonia Wüst; Tim Woydt; Lukas Helff; Devendra Singh Dhami; Constantin A. Rothkopf; Kristian Kersting
In: Computing Research Repository eprint Journal (CoRR), Vol. abs/2410.19546, Pages 1-25, arXiv, 2024.
Abstract
Recently, newly developed Vision-Language
Models (VLMs), such as OpenAI’s o1, have
emerged, seemingly demonstrating advanced rea-
soning capabilities across text and image modal-
ities. However, the depth of these advances
in language-guided perception and abstract rea-
soning remains underexplored, and it is unclear
whether these models can truly live up to their
ambitious promises. To assess the progress and
identify shortcomings, we enter the wonderland
of Bongard problems, a set of classic visual rea-
soning puzzles that require human-like abilities of
pattern recognition and abstract reasoning. With
our extensive evaluation setup, we show that while
VLMs occasionally succeed in identifying dis-
criminative concepts and solving some of the
problems, they frequently falter. Surprisingly,
even elementary concepts that may seem trivial to
humans, such as simple spirals, pose significant
challenges. Moreover, when explicitly asked to
recognize ground truth concepts, they continue to
falter, suggesting not only a lack of understanding
of these elementary visual concepts but also an
inability to generalize to unseen concepts. We
compare the results of VLMs to human perfor-
mance and observe that a significant gap remains
between human visual reasoning capabilities and
machine cognition.
