Skip to main content Skip to main navigation

Publication

Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?

Antonia Wüst; Tim Woydt; Lukas Helff; Devendra Singh Dhami; Constantin A. Rothkopf; Kristian Kersting
In: Computing Research Repository eprint Journal (CoRR), Vol. abs/2410.19546, Pages 1-25, arXiv, 2024.

Abstract

Recently, newly developed Vision-Language Models (VLMs), such as OpenAI’s o1, have emerged, seemingly demonstrating advanced rea- soning capabilities across text and image modal- ities. However, the depth of these advances in language-guided perception and abstract rea- soning remains underexplored, and it is unclear whether these models can truly live up to their ambitious promises. To assess the progress and identify shortcomings, we enter the wonderland of Bongard problems, a set of classic visual rea- soning puzzles that require human-like abilities of pattern recognition and abstract reasoning. With our extensive evaluation setup, we show that while VLMs occasionally succeed in identifying dis- criminative concepts and solving some of the problems, they frequently falter. Surprisingly, even elementary concepts that may seem trivial to humans, such as simple spirals, pose significant challenges. Moreover, when explicitly asked to recognize ground truth concepts, they continue to falter, suggesting not only a lack of understanding of these elementary visual concepts but also an inability to generalize to unseen concepts. We compare the results of VLMs to human perfor- mance and observe that a significant gap remains between human visual reasoning capabilities and machine cognition.

More links