Inferring the depth and shape of remote objects and the camera motion from a sequence of images is possible in principle, but is an ill-conditioned problem when the objects are distant with respect to their size. This problem is overcome by inferring shape and motion without computing depth as an intermediate step. On a single epipolar plane, an image sequence can be represented by the F × P matrix of the image coordinates of P points tracked through F frames. It is shown that under orthographic projection this matrix is of rank three. Using this result, the authors develop a shape-and-motion algorithm based on singular value decomposition. The algorithm gives accurate results, without relying on any smoothness assumption for either shape or motion.