Publication
WHSP-Net: A Weakly-Supervised Approach for 3D Hand Shape and Pose Recovery from a Single Depth Image
Jameel Malik; Ahmed Elhayek; Didier Stricker
In: Sensors - Open Access Journal (Sensors), Vol. 19, Pages 2-15, MDPI, 8/2019.
Abstract
Hand shape and pose recovery is essential for many computer vision applications such as
animation of a personalized hand mesh in a virtual environment. Although there are many hand pose
estimation methods, only a few deep learning based algorithms target 3D hand shape and pose from
a single RGB or depth image. Jointly estimating hand shape and pose is very challenging because
none of the existing real benchmarks provides ground truth hand shape. For this reason, we propose
a novel weakly-supervised approach for 3D hand shape and pose recovery (named WHSP-Net)
from a single depth image by learning shapes from unlabeled real data and labeled synthetic data.
To this end, we propose a novel framework which consists of three novel components. The first is
the Convolutional Neural Network (CNN) based deep network which produces 3D joints positions
from learned 3D bone vectors using a new layer. The second is a novel shape decoder that recovers
dense 3D hand mesh from sparse joints. The third is a novel depth synthesizer which reconstructs
2D depth image from 3D hand mesh. The whole pipeline is fine-tuned in an end-to-end manner.
We demonstrate that our approach recovers reasonable hand shapes from real world datasets as
well as from live stream of depth camera in real-time. Our algorithm outperforms state-of-the-art
methods that output more than the joint positions and shows competitive performance on 3D pose
estimation task. The results of NYU hand shape reconstruction and pose estimation can be downloaded from https://cloud.dfki.de/owncloud/index.php/s/5QeDcAECCMGj3gB