Publikation
SF3D-RGB: Scene Flow Estimation from Monocular Camera and Sparse LiDAR
Rajai Alhimdiat; Ramy Battrawy; René Schuster; Didier Stricker; Wesam Ashour
In: Proceedings of the 2026 Computer Vision Conference (CVC). Computer Vision Conference (CVC-2026), May 21-22, Amsterdam, Netherlands, Springer, 5/2026.
Zusammenfassung
Scene flow estimation is an extremely important task in computer vision to support the perception of dynamic changes in the scene. For robust scene flow, learning-based approaches have recently achieved impressive results using either image-based or LiDAR-based modalities. However, these methods have tended to focus on the use of a single modality. To tackle these problems, we present a deep learning architecture, SF3D-RGB, that enables sparse scene flow estimation using 2D monocular images and 3D point clouds (e.g., acquired by LiDAR) as inputs. Our architecture is an end-to-end model that first encodes information from each modality into features and fuses them together. Then, the fused features enhance a graph matching module for better and more robust mapping matrix computation to generate an initial scene flow. Finally, a residual scene flow module further refines the initial scene flow. Our model is designed to strike a balance between accuracy and efficiency. Furthermore, experiments show that our proposed method outperforms single-modality methods and achieves better scene flow accuracy on real-world datasets while using fewer parameters compared to other state-of-the-art methods with fusion.
