Publication
Learning by Self-Explaining
Wolfgang Stammer; Felix Friedrich; David Steinmann; Manuel Brack; Hikaru Shindo; Kristian Kersting
In: Transactions on Machine Learning Research (TMLR), Vol. 2024, Pages 1-35, arXiv, 2024.
Abstract
Much of explainable AI research treats explanations as a means for model inspection. Yet,
this neglects findings from human psychology that describe the benefit of self-explanations
in an agent’s learning process. Motivated by this, we introduce a novel workflow in the
context of image classification, termed Learning by Self-Explaining (LSX). LSX utilizes
aspects of self-refining AI and human-guided explanatory machine learning. The underlying
idea is that a learner model, in addition to optimizing for the original predictive task, is
further optimized based on explanatory feedback from an internal critic model. Intuitively,
a learner’s explanations are considered “useful” if the internal critic can perform the same
task given these explanations. We provide an overview of important components of LSX
and, based on this, perform extensive experimental evaluations via three different example
instantiations. Our results indicate improvements via Learning by Self-Explaining on several
levels: in terms of model generalization, reducing the influence of confounding factors, and
providing more task-relevant and faithful model explanations. Overall, our work provides
evidence for the potential of self-explaining within the learning phase of an AI model
