

The IEEE/CVF Conference on Computer Vision and Pattern Recognition, or CVPR for short, is one of the most important conferences in the field of computer vision research and took place this year from 3 to 7 June in Denver. The DFKI was represented there with several accepted papers from various areas of research. The focus was on a paper from the Augmented Vision research group that addresses a key weakness in current 3D scene analysis: systems recognise objects but often fail to understand how they relate to one another.
The DFKI contributions to the main conference cover a broad spectrum of visual AI. From the Augmented Vision research area come “ReLaGS: Relational Language Gaussian Splatting”, “DriverGaze360: OmniDirectional Driver Attention with Object-Level Guidance”, “LiREC-Net: A Target-Free and Learning-Based Network for LiDAR, RGB, and Event Calibration”, and “SIMSPINE: A Biomechanics-Aware Simulation Framework for 3D Spine Motion Annotation and Benchmarking”.
These are joined by “OpenMarcie: A dataset for multimodal action recognition in industrial environments” and “When Pretty Isn’t Useful: An investigation into why modern text-to-image models fail as reliable generators of training data”, as well as “YieldSAT: A multimodal benchmark dataset for high-resolution yield prediction” from Kaiserslautern, “Synthesising Visual Concepts as Vision-Language Programs” from Darmstadt, and “SceMoS: Scene-Aware 3D Human Motion Synthesis by Planning with Geometry-Grounded Tokens” from Saarbrücken. Collectively, the topics range from open 3D scene understanding and multimodal perception to sensor calibration, medical simulation, synthetic training data and generative motion modelling.
Within this spectrum, ReLaGS stands out. The paper by Yaxu Xie, Abdalla Arafa, Alireza Javanmardi, Christen Millerdurai, Jia Cheng Hu, Shaoxiang Wang, Alain Pagani and Didier Stricker combines a hierarchical 3D scene representation with an explicit scene graph that models relationships between objects. This makes it possible not only to identify objects within a scene, but also to process relational queries such as ‘the cup next to the laptop’ or more nuanced part-whole relationships within complex 3D environments.
The method is based on Gaussian splatting, a state-of-the-art technique for high-resolution 3D reconstruction. ReLaGS supplements this with linguistic semantics and relational reasoning, organises scenes hierarchically – from parts through objects to the entire space – and does not require scene-specific training.

“With ReLaGS, we have shown that 3D scene understanding need not stop at the recognition of individual objects. The key lies in modelling relationships, hierarchies and semantic contexts together – only then does reconstruction truly become machine understanding.”
In the paper, the researchers report that ReLaGS generates a complete scene graph in under 15 minutes and renders at over 200 frames per second. Compared to RelationField, the approach is therefore 4.7 times faster and 7.6 times more memory-efficient. On benchmarks for open 3D segmentation, scene graph prediction and relation-guided instance segmentation, ReLaGS also achieves state-of-the-art results.
This is relevant to research because 3D scene understanding is increasingly needed in areas where machines are required to operate safely and contextually in complex environments: in robotics, XR, industrial digital twins, or semantically rich human-machine interfaces. ReLaGS demonstrates how geometric reconstruction, linguistic semantics and relational structure can be integrated into a single framework.
In addition to the Main Conference, DFKI was also represented in other formats at CVPR 2026. From the Augmented Vision research group, “GHOST: Fast Category-Agnostic Hand-Object Interaction Reconstruction from RGB Videos Using Gaussian Splatting” and “ReConText3D: Replay-based Continual Text-to-3D Generation” were accepted as findings posters. “TAUE: Training-free Noise Transplant and Cultivation Diffusion Model” was also featured among the Findings posters.
In addition, there were workshop presentations entitled ‘Probing the Reliability of Driving VLMs: From Inconsistent Responses to Grounded Temporal Reasoning’ at the AUTOPILOT workshop, and ‘Inpaint360GS: Efficient Object-Aware 3D Inpainting via Gaussian Splatting for 360° Scenes’ at the SPAR-3D workshop. This demonstrated DFKI’s presence at CVPR 2026 not only at the main conference but also in formats where current methodological issues and new fields of application are discussed.