Skip to main content Skip to main navigation

VLDB 2025 London: DFKI research improves the efficiency and intelligence of data management

| Knowledge & Business Intelligence | Data Management & Analysis | Systems AI for Decision Support | Darmstadt

Databases form the basis of many IT applications and facilitate the organisation of large volumes of data. To increase efficiency and generate new, profitable insights from existing datasets, researchers at DFKI in Darmstadt have developed a set of methods that they will present at the VLDB conference in London in 2025. The internationally renowned database conference will take place from 1 to 5 September.

© Pratyush Agnihotri/Matthias Urban

Databases ensure that digital processes are transparent: every item sold in a supermarket is recorded in a database. In online banking, transactions are stored in databases so that they can be traced and verified. It is also only thanks to databases that online retailers can reliably show us our order history and what we have purchased in the past. However, this technology also presents a number of challenges: managing large databases involves high computing costs and enormous storage requirements.

Increased efficiency through artificial intelligence

Databases generate unique results in response to queries. However, these results can be produced in different ways. Various decisions need to be made, such as how the tables should be analysed, in what order, and how they can be linked together efficiently. Special cost models compare execution strategies and help select the fastest one. This saves computing power and time. One new approach to selecting the best execution plan is Learned Cost Models (LCMs).  Here, databases use machine learning based on training data to determine the most efficient execution plan. Researchers at DFKI and TU Darmstadt have now addressed the explainability of LCMs to make their decisions more transparent. At the VLDB, Roman Heinrich (DFKI, TU Darmstadt) will present the paper ‘Opening the Black Box: Explaining Learned Cost Models for Databases'. Heinrich and his co-authors Oleksandr Havrylov (TU Darmstadt), Manisha Luthra Agnihotri (DFKI, TU Darmstadt), Johannes Wehrstein (TU Darmstadt) and Carsten Binnig (DFKI, TU Darmstadt) have developed new techniques to adapt existing methods for the general explainability of AI models. These methods can now be used for LCMs to improve understanding and further optimisation of these models.

Flexible databases for better data exploration

Relational databases are widely used for analysing and exploring data. These databases organise data in tables and require a previously developed schema as a blueprint. Designing the schema, as well as making subsequent adjustments, is very time-consuming. Schema-less databases offer a more flexible alternative. However, during data analysis, it is necessary to determine the meaning of certain data and whether all the necessary information is available. Researchers at DFKI are now looking to develop a new type of data system that can structure data autonomously without the need for a schema. At VLDB, Benjamin Hättasch, Leon Krüger (TU Darmstadt) and Prof. Carsten Binnig will demonstrate the technology 'JUSTINE (JUST-INsert Engine): Demonstrating Self-organising Data Schemas'. When new data is entered into a relational database, JUSTINE attempts to assign it to an existing table. If no suitable table exists, it creates a new one and adds the data. Even if data entries are incomplete or table and column names are missing, the technology attempts to automatically find the appropriate storage location based on the data. Additionally, queries may contain columns that do not yet exist in the schema. If necessary, JUSTINE automatically adds such columns. This way, the technology combines the advantages of relational and schema-less databases for more flexible data analysis.

Innovations for Databases: Workshop: 'Applied AI for Database Systems and Applications'

At the sixth annual "Applied AI for Database Systems and Applications" workshop, held as part of VLDB, the paper "JOB-Complex: A Challenging Benchmark for Traditional and Learned Query Optimisation', was awarded the Best Paper Award. The authors were Johannes Wehrstein (TU Darmstadt), Timo Eckmann (TU Darmstadt), Roman Heinrich (DFKI, TU Darmstadt) and Carsten Binnig (DFKI, TU Darmstadt). They presented JOB-Complex, a new benchmark that challenges traditional and learned query optimisers by reflecting real-world complexity. In the same workshop, researchers from the two institutions presented the paper ‘Learning What Matters: Automated Feature Selection for Learned Performance Models in Parallel Stream Processing'. Pratyush Agnihotri, Carsten Binnig and Manisha Luthra Agnihotri presented a new automated pipeline for selecting features for performance modelling in DSP systems to optimise parallelisation.

VLDB: A diverse range of database research in one place

At VLDB 2025, international researchers discusse a wide range of topics related to all aspects of data management, with a particular focus on system issues. In addition to presenting their latest research findings, Manisha Luthra Agnihotri (DFKI & TU Darmstadt) and Roman Heinrich will be hosting a tutorial entitled 'Learned Cost Models for Query Optimisation: From Batch to Streaming Systems', alongside Danish researchers Xiao Li and Zoi Kaoudi, both from the IT University of Copenhagen. Carsten Binnig will also participate in a panel discussion on the topic of 'Neural Relational Data: Tabular Foundation Models, LLMs...or Both?' alongside Paolo Papotti (Eurecom), Floris Geerts (University of Antwerp), Johannes Hoffart (SAP), Madelon Hulsebos (CWI), Fatma Özcan (Google), and Gael Varoquaux (INRIA). Through their contributions, the DFKI Darmstadt researchers are helping to shape the future of data management.

More Information:
Opening The Black-Box: Explaining Learned Cost Models For Databases
Learning What Matters: Automated Feature Selection for Learned Performance Models in Parallel Stream Processing
JOB-Complex: A Challenging Benchmark for Traditional & Learned Query Optimization
JUSTINE (JUST-INsert Engine): Demonstrating Self-organizing Data Schemas

© Benjamin Hättasch