The Focus--Aspect--Value model for predicting subjective visual attributes

Tushar Karayil, Philipp Blandfort, Jörn Hees, Andreas Dengel

In: International Journal of Multimedia Information Retrieval (IJMR) 9 1 Seiten 47-60 Springer 2020.


Predicting subjective visual interpretation is important for several prominent tasks in computer vision, including multimedia retrieval. Many approaches reduce this problem to the prediction of adjective- or attribute-labels from images while neglecting attribute semantics and only processing the image in a holistic manner. Furthermore, there is a lack of relevant datasets with fine-grained subjective labels and sufficient scale for machine learning. In this paper, we explain the Focus-Aspect-Value (FAV) model to break down the process of subjective image interpretation into three steps, and describe a dataset following this way of modeling. We train and evaluate several deep learning methods on this dataset, while we extend the experiments of the paper originally introducing FAV by adding a new evaluation metric, improving the concatenation approach and adding Multiplicative Fusion as another method. In our experiments Tensor Fusion is among the best performing methods across all measures and outperforms the default way of information fusion (concatenation). In addition, we find that the way of combining information in neural networks not only affects prediction performance but can drastically change other properties of the model as well.


Weitere Links

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence