Publication

Detecting when Users Disagree with Generated Captions

Omair Shahzad Bhatti; Harshinee Sriram; Abdulrahman Mohamed Selim; Cristina Conati; Michael Barz; Daniel Sonntag

In: Companion Proceedings of the 26th International Conference on Multimodal Interaction. ACM International Conference on Multimodal Interaction (ICMI-2024), November 4, San José, Costa Rica, Pages 195-203, ICMI Companion '24, ISBN 9798400704635, Association for Computing Machinery, New York, NY, USA, 2024.

Abstract

The pervasive integration of artificial intelligence (AI) into daily life has led to a growing interest in AI agents that can learn continuously. Interactive Machine Learning (IML) has emerged as a promising approach to meet this need, essentially involving human experts in the model training process, often through iterative user feedback. However, repeated feedback requests can lead to frustration and reduced trust in the system. Hence, there is increasing interest in refining how these systems interact with users to ensure efficiency without compromising user experience. Our research investigates the potential of eye tracking data as an implicit feedback mechanism to detect user disagreement with AI-generated captions in image captioning systems. We conducted a study with 30 participants using a simulated captioning interface and gathered their eye movement data as they assessed caption accuracy. The goal of the study was to determine whether eye tracking data can predict user agreement or disagreement effectively, thereby strengthening IML frameworks. Our findings reveal that, while eye tracking shows promise as a valuable feedback source, ensuring consistent and reliable model performance across diverse users remains a challenge.

Detecting when Users Disagree with Generated Captions

Abstract

Projects

More links