Skip to main content Skip to main navigation


To Clarify or not to Clarify: A Comparative Analysis of Clarification Classification with Fine-Tuning, Prompt Tuning, and Prompt Engineering

Alina Leippert; Tatiana Anikina; Bernd Kiefer; Josef van Genabith
In: Yang Cao; Isabel Papadimitriou; Anaelia Ovalle; Marcos Zampieri; Frank Ferraro; Swabha Swayamdipta (Hrsg.). Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics (Student Research Workshop). NAACL-HLT Student Research Workshop (NAACL-SRW-2024), located at NAACL, June 18, Mexico City, Mexico, Association for Computational Linguistics, 2024.


Misunderstandings occur all the time in human conversation but deciding on when to ask for clarification is a challenging task for conversational systems that requires a balance between asking too many unnecessary questions and running the risk of providing incorrect information. This work investigates clarification identification based on the task and data from (Xu et al., 2019), reproducing their Transformer baseline and extending it by comparing pre-trained language model fine-tuning, prompt tuning and manual prompt engineering on the task of clarification identification. Our experiments show strong performance with a joint LM and prompt tuning approach with BERT and RoBERTa, outperforming LM fine-tuning, while manual prompt engineering with GPT-3.5 proved to be less effective, although informative prompt instructions have the potential of steering the model towards generating more accurate explanations for why clarification is needed.