Factoring image sequences into shape and motion
Recovering scene geometry and camera motion from a sequence of images is an important problem in computer vision. If the scene geometry is specified by depth measurements, that is, by specifying distances between the camera and feature points in the scene, noise sensitivity worsens rapidly with increasing depth. In this paper, we show that this difficulty can be overcome by computing scene geometry directly in terms of shape, that is, by computing the coordinates of feature points in the scene with respect to a world-centered system, without recovering camera-centered depth as an intermediate quantity. More specifically, we show that a matrix of image measurements can be factored by Singular Value Decomposition into the product of two matrices that represent shape and motion, respectively. The results in this paper extend to three dimensions the solution we described in a previous paper for planar camera motion.