DFKI presents 14 papers at CVPR 2026

With several papers presented at CVPR 2026, DFKI demonstrated the breadth of its research in visual AI. The spectrum ranged from 3D scene understanding and relational reasoning, through multimodal perception, to simulation, generation and workshop contributions.

DFKI was not only represented at CVPR 2026 with numerous papers, but also by the researchers behind them. For just under a week, they were able to discuss their topics in workshops and present them on the main stage.

The IEEE/CVF Conference on Computer Vision and Pattern Recognition, or CVPR for short, is one of the most important conferences in the field of computer vision research and took place this year from 3 to 7 June in Denver. The DFKI was represented there with several accepted papers from various areas of research. The focus was on a paper from the Augmented Vision research group that addresses a key weakness in current 3D scene analysis: systems recognise objects but often fail to understand how they relate to one another.

Main conference papers from the DFKI

The DFKI contributions to the main conference cover a broad spectrum of visual AI. From the Augmented Vision research area come “ReLaGS: Relational Language Gaussian Splatting”, “DriverGaze360: OmniDirectional Driver Attention with Object-Level Guidance”, “LiREC-Net: A Target-Free and Learning-Based Network for LiDAR, RGB, and Event Calibration”, and “SIMSPINE: A Biomechanics-Aware Simulation Framework for 3D Spine Motion Annotation and Benchmarking”.

These are joined by “OpenMarcie: A dataset for multimodal action recognition in industrial environments” and “When Pretty Isn’t Useful: An investigation into why modern text-to-image models fail as reliable generators of training data”, as well as “YieldSAT: A multimodal benchmark dataset for high-resolution yield prediction” from Kaiserslautern, “Synthesising Visual Concepts as Vision-Language Programs” from Darmstadt, and “SceMoS: Scene-Aware 3D Human Motion Synthesis by Planning with Geometry-Grounded Tokens” from Saarbrücken. Collectively, the topics range from open 3D scene understanding and multimodal perception to sensor calibration, medical simulation, synthetic training data and generative motion modelling.

ReLaGS

Within this spectrum, ReLaGS stands out. The paper by Yaxu Xie, Abdalla Arafa, Alireza Javanmardi, Christen Millerdurai, Jia Cheng Hu, Shaoxiang Wang, Alain Pagani and Didier Stricker combines a hierarchical 3D scene representation with an explicit scene graph that models relationships between objects. This makes it possible not only to identify objects within a scene, but also to process relational queries such as ‘the cup next to the laptop’ or more nuanced part-whole relationships within complex 3D environments.

The method is based on Gaussian splatting, a state-of-the-art technique for high-resolution 3D reconstruction. ReLaGS supplements this with linguistic semantics and relational reasoning, organises scenes hierarchically – from parts through objects to the entire space – and does not require scene-specific training.

Alain Pagani, Deputy Head of the Augmented Vision Research Division at DFKI

“With ReLaGS, we have shown that 3D scene understanding need not stop at the recognition of individual objects. The key lies in modelling relationships, hierarchies and semantic contexts together – only then does reconstruction truly become machine understanding.”
Alain Pagani, Deputy Head of the Augmented Vision Research Division at DFKI

Results and relevance

In the paper, the researchers report that ReLaGS generates a complete scene graph in under 15 minutes and renders at over 200 frames per second. Compared to RelationField, the approach is therefore 4.7 times faster and 7.6 times more memory-efficient. On benchmarks for open 3D segmentation, scene graph prediction and relation-guided instance segmentation, ReLaGS also achieves state-of-the-art results.

This is relevant to research because 3D scene understanding is increasingly needed in areas where machines are required to operate safely and contextually in complex environments: in robotics, XR, industrial digital twins, or semantically rich human-machine interfaces. ReLaGS demonstrates how geometric reconstruction, linguistic semantics and relational structure can be integrated into a single framework.

Further conference papers

In addition to the Main Conference, DFKI was also represented in other formats at CVPR 2026. From the Augmented Vision research group, “GHOST: Fast Category-Agnostic Hand-Object Interaction Reconstruction from RGB Videos Using Gaussian Splatting” and “ReConText3D: Replay-based Continual Text-to-3D Generation” were accepted as findings posters. “TAUE: Training-free Noise Transplant and Cultivation Diffusion Model” was also featured among the Findings posters.

In addition, there were workshop presentations entitled ‘Probing the Reliability of Driving VLMs: From Inconsistent Responses to Grounded Temporal Reasoning’ at the AUTOPILOT workshop, and ‘Inpaint360GS: Efficient Object-Aware 3D Inpainting via Gaussian Splatting for 360° Scenes’ at the SPAR-3D workshop. This demonstrated DFKI’s presence at CVPR 2026 not only at the main conference but also in formats where current methodological issues and new fields of application are discussed.

An overview of all papers

ReLaGS: Relational Language Gaussian Splatting - Yaxu Xie, Abdalla Arafa, Alireza Javanmardi, Christen Millerdurai, Jia Cheng Hu, Shaoxiang Wang, Alain Pagani, Didier Stricker
DriverGaze360: OmniDirectional Driver Attention with Object-Level Guidance - Shreedhar Govil, Didier Stricker, Jason Rambach
LiREC-Net: A Target-Free and Learning-Based Network for LiDAR, RGB, and Event Calibration - Aditya Ranjan Dash, Ramy Battrawy, René Schuster, Didier Stricker
SIMSPINE: A Biomechanics-Aware Simulation Framework for 3D Spine Motion Annotation and Benchmarking - Muhammad Saif Ullah Khan, Didier Stricker
OpenMarcie: Dataset for Multimodal Action Recognition in Industrial Environments - Hymalai Bello, Lala Ray, Joanna Sorysz, Sungho Suh, Paul Lukowicz
When Pretty Isn’t Useful: Investigating Why Modern Text-to-Image Models Fail as Reliable Training Data Generators - Krzysztof Adamkiewicz, Brian Moser, Stanislav Frolov, Tobias Christian Nauen, Federico Raue, Andreas Dengel
SceMoS: Scene-Aware 3D Human Motion Synthesis by Planning with Geometry-Grounded Tokens - Anindita Ghosh, Vladislav Golyanik, Taku Komura, Philipp Slusallek, Christian Theobalt, Rishabh Dabral
GHOST: Fast Category-Agnostic Hand-Object Interaction Reconstruction from RGB Videos Using Gaussian Splatting - Ahmed Tawfik Aboukhadra, Marcel Rogge, Nadia Robertini, Abdalla Arafa, Jameel Malik, Ahmed Elhayek, Didier Stricker
Probing the Reliability of Driving VLMs: From Inconsistent Responses to Grounded Temporal Reasoning - Chun-Peng Chang, Chen-Yu Wang, Holger Caesar, Alain Pagani
Inpaint360GS: Efficient Object-Aware 3D Inpainting via Gaussian Splatting for 360° Scenes - Shaoxiang Wang, Shihong Zhang, Christen Millerdurai, Rüdiger Westermann, Didier Stricker, Alain Pagani
ReConText3D: Replay-based Continual Text-to-3D Generation - Muhammad Ahmed Ullah Khan, Muhammad Haris Bin Amir, Didier Stricker, Muhammad Zeshan Afzal
TAUE: Training-free Noise Transplant and Cultivation Diffusion Model - Daichi Nagai, Ryugo Morita, Shunsuke Kitada, Hitoshi Iyatomi
YieldSAT: A Multimodal Benchmark Dataset for HighResolution Crop Yield Prediction - Miro Miranda, Deepak Pathak, Patrick Helber, Benjamin Bischke, Hiba Najjar, Francisco Mena, Cristhian Sanchez, Akshay Pai, Diego Arenas, Matias Valdenegro-Toro, Marcela Charfuelan, Marlon Nuske, Andreas Dengel
Synthesizing Visual Concepts as Vision-Language Programs - Antonia Wüst, Wolfgang Stammer, Hikaru Shindo, Lukas Helff, Devendra Singh Dhami, Kristian Kersting

Press contact:

Jeremy Gob

Editor & PR-Officer, DFKI

Jeremy.Gob@dfki.de
Phone: +49 631 20575 1730

DFKI at CVPR 2026: From guided 3D scenes to spinal simulation

Jeremy Gob