Publikation
HandMvNet: Real-Time 3D Hand Pose Estimation Using Multi-View Cross-Attention Fusion
Muhammad Asad Ali; Nadia Robertini; Didier Stricker
In: Thomas Bashford-Rogers; Daniel Meneveaux; Mehdi Ammi; Mounia Ziat; Stefan Jänicke; Helen C. Purchase; Petia Radeva; Antonino Furnari; Kadi Bouatouch; A. Augusto Sousa (Hrsg.). Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications. International Conference on Computer Vision Theory and Applications (VISAPP-2025), located at VISIGRAPP-2025, February 26-28, Porto, Portugal, Vol. 2, ISBN 978-989-758-728-3, SCITEPRESS, Portugal, 2/2025.
Zusammenfassung
In this work, we present HandMvNet, one of the first real-time method designed to estimate 3D hand motion and shape from multi-view camera images. Unlike previous monocular approaches, which suffer from scale-depth ambiguities, our method ensures consistent and accurate absolute hand poses and shapes. This is achieved through a multi-view attention-fusion mechanism that effectively integrates features from multiple viewpoints. In contrast to previous multi-view methods, our approach eliminates the need for camera parameters as input to learn 3D geometry. HandMvNet also achieves a substantial reduction in inference time while delivering competitive results compared to the state-of-the-art methods, making it suitable for real-time applications. Evaluated on publicly available datasets, HandMvNet qualitatively and quantitatively outperforms previous methods under identical settings. Code is available at github.com/pyxploiter/handmvnet.
Projekte
- FLUENTLY - Fluently - the essence of human-robot interaction
- SHARESPACE - Embodied Social Experiences in Hybrid Shared Spaces