Publication

LLMs for Data Engineering on Enterprise Data

Jan-Micha Bodensohn; Ulf Brackmann; Liane Vogel; Matthias Urban; Anupam Sanghi; Carsten Binnig

In: Proceedings of Workshops at the 50th International Conference on Very Large Data Bases, VLDB 2024. International Workshop on Tabular Data Analysis (TaDA), VLDB.org, 2024.

Abstract

A recent line of work applies Large Language Models (LLMs) to data engineering tasks on tabular data, suggesting they can solve a broad spectrum of tasks with high accuracy. However, existing research primarily uses datasets based on tables from web sources such as Wikipedia, calling the applicability of LLMs for real-world enterprise data into question. In this paper, we perform a first analysis of LLMs for solving data engineering tasks on a real-world enterprise dataset. As an exemplary task, we apply recent LLMs to the task of column type annotation to study how the data characteristics affect the LLMs' accuracy and find that LLMs have severe limitations when dealing with enterprise data. Based on these findings, we point towards promising directions for adapting LLMs to the enterprise context.