Publication
Sum-Product-Attention Networks: Leveraging Self-Attention in Probabilistic Circuits
Zhongjie Yu; Devendra Singh Dhami; Kristian Kersting
In: Computing Research Repository eprint Journal (CoRR), Vol. abs/2109.06587, Pages 0-10, arXiv, 2021.
Abstract
Probabilistic circuits (PCs) have become the de-facto standard for learning and inference in probabilistic modeling. We introduce Sum-Product-Attention Networks (SPAN), a new generative model that integrates probabilistic circuits with Transformers. SPAN uses self-attention to select the most relevant parts of a probabilistic circuit, here sum-product networks, to improve the modeling capability of the underlying sum-product network. We show that while modeling, SPAN focuses on a specific set of independent assumptions in every product layer of the sum-product network. Our empirical evaluations show that SPAN outperforms state-of-the-art probabilistic generative models on various benchmark data sets as well is an efficient generative image model.