Publication
Pix2Code: Learning to Compose Neural Visual Concepts as Programs
Antonia Wüst; Wolfgang Stammer; Quentin Delfosse; Devendra Singh Dhami; Kristian Kersting
In: Computing Research Repository eprint Journal (CoRR), Vol. abs/2402.08280, Pages 1-24, arxiv, 2024.
Abstract
The challenge in learning abstract concepts from
images in an unsupervised fashion lies in the re-
quired integration of visual perception and gener-
alizable relational reasoning. Moreover, the unsu-
pervised nature of this task makes it necessary for
human users to be able to understand a model’s
learned concepts and potentially revise false behav-
iors. To tackle both the generalizability and inter-
pretability constraints of visual concept learning,
we propose Pix2Code, a framework that extends
program synthesis to visual relational reasoning
by utilizing the abilities of both explicit, compo-
sitional symbolic and implicit neural representa-
tions. This is achieved by retrieving object repre-
sentations from images and synthesizing relational
concepts as λ-calculus programs. We evaluate the
diverse properties of Pix2Code on the challenging
reasoning domains, Kandinsky Patterns, and CURI,
testing its ability to identify compositional visual
concepts that generalize to novel data and con-
cept configurations. Particularly, in stark contrast
to neural approaches, we show that Pix2Code’s
representations remain human interpretable and
can easily be revised for improved performance.
