Skip to main content Skip to main navigation

Publication

Investigating the Configurability of LLMs for the Generation of Knowledge Work Datasets

Desiree Heim; Christian Jilek; Adrian Ulges; Andreas Dengel
In: ICAART 2025. International Conference on Agents and Artificial Intelligence (ICAART-2025), International Conference on Agents and Artificial Intelligence, February 23-25, Porto, Portugal, 2025.

Abstract

The evaluation of support tools designed for knowledge workers is challenging due to the lack of publicly available, extensive, and complete data collection. Existing data collections have inherent problems such as incompleteness due to privacy-preserving methods and lack of contextual information. Hence, generating datasets can represent a good alternative, in particular, Large Language Models (LLM) enable a simple possibility of generating textual artifacts. Just recently such a knowledge work dataset generator, called KnoWoGen, has been proposed. However, the adherence of generated knowledge work documents to parameters such as document type, involved persons, or topics has not been examined. This aspect is crucial to examine since generated documents should reflect these given parameters properly as they could represent important ground truth information for training or evaluations. In this paper, we address this missing evaluation aspect by conducting respective user studies. These studies assess parameter adherence and adherence to a given domain as an important, representative parameter. We base our experiments on documents generated with a KnoWoGen version that we got provided by its authors and use the Mistral-7B-Instruct model as LLM. We observe that in the given setting, the generated documents showed a high quality regarding the adherence to parameters in general and a parameter specifying the domain. Hence, 75% of the given ratings in the parameter-related experiments received the highest or second-highest quality score which is a promising outcome for the feasibility of generating high-qualitative knowledge work documents based on given configurations.

Projects