Publikation
Towards Visual-Inertial SLAM for Mobile Augmented Reality
Gabriele Bleser
ISBN 978-3-86853-048-3, Dr. Hut, München, 3/2009.
Zusammenfassung
The basic idea of augmented reality is to augment the view of a user or camera with virtual
objects. Real-time camera tracking is an enabling technology for augmented reality. Besides
the high estimation precision that is needed to allow for pixel accurate augmentations, mobile
augmented reality applications impose further requirements on the tracking method. One important
aspect is robustness in the presence of quick and erratic camera motions, which are
typical for a handheld or head-mounted camera.
This thesis investigates robustness and accuracy of real-time markerless camera tracking
for mobile augmented reality applications considering both known and unknown environments
of different complexity. The major solution strategies are: visual-inertial sensor fusion, decoupling
of pose and structure estimation and unification of computer vision and recursive filtering
techniques.
First, a model-based camera tracking system that fuses visual and inertial measurements
in the extended Kalman filter is developed. It uses an affine illumination invariant image
processing method that exploits the pose prediction obtained from the sensor fusion algorithm
and a textured CAD model of the environment to predict the appearances of corner features
in the camera images. In several experiments, the system is demonstrated to work robustly in
realistic environments of different complexity, under varying light conditions, fast and erratic
camera motions and even short periods without visible features. Compared to vision-only
tracking, the system shows less jitter and the computational costs are reduced both due to
an accurate prediction of the feature appearances and locations in the images and a reduced
demand on features in general.
Tracking in partially known environments is addressed by developing a vision-only system,
which requires minimal pre-knowledge about the structure of the target scene and derives
3D information online. The system combines robust and efficient sequential structure from
motion methods with a simplified stochastic model and recursive localisation of 3D point features.
It scales with the size of the map, since pose and structure estimation are decoupled.
Compared to an ordinary sequential structure from motion system, the developed method provides
higher accuracy in both camera and feature localisation and significantly less drift. This
is demonstrated in different experiments based on simulated data and in realistic mid-scale
environments.
Based on these results, a conceptual solution for visual-inertial simultaneous localisation
and mapping in large-scale environments is developed. The idea is to combine the marginalised
particle filter for visual-inertial pose estimation with undelayed feature initialisation and localisation
on a per-particle basis. The essential algorithms for pose and structure estimation are
developed and a proof-of-concept implementation is evaluated in a simple test environment,
demonstrating that the novel tracking strategy unifies and enhances the previous developments.