Publikation
Automating Enterprise Data Engineering with LLMs
Jan-Micha Bodensohn; Ulf Brackmann; Liane Vogel; Anupam Sanghi; Carsten Binnig
In: NeurIPS 2024 Third Table Representation Learning Workshop. Table Representation Learning Workshop @ NeurIPS (TRL), Table Representation Learning Workshop at NeurIPS, 2024.
Zusammenfassung
The automation of data engineering tasks is invaluable for enterprises to increase efficiency and reduce the manual effort associated with handling large amounts of data. Large Language Models (LLMs) have recently shown promising results in enabling this automation. However, data engineering tasks in real-world enterprise scenarios are often more complex than their typical formulations in the scientific community. In this paper, we study the challenges that arise when automating real-world enterprise data engineering tasks with LLMs. As part of the paper, we perform a case study on the task of matching incoming payments to open invoices, an instance of the entity matching problem. We also release a hand-crafted dataset based on the actual enterprise scenario to enable the research community to study the complexity of such enterprise tasks.