Deep convolutional segmentation of remote sensing imagery: A simple and efficient alternative to stitching output labels
In this work we consider the application of convolutional neural networks (CNNs) for the semantic segmentation of remote sensing imagery (e.g., aerial color or hyperspectral imagery). In segmentation the goal is to provide a dense pixel-wise labeling of the input imagery. However, remote sensing imagery is usually stored in the form of very large images, called “tiles”, which are too large to be segmented directly using most CNNs and their associated hardware. During label inference (i.e., obtaining labels for a new large tile) smaller sub-images, called “patches”, are extracted uniformly over a tile and the resulting label maps are “stitched” (or concatenated) to create a tile-sized label map. This approach suffers from computational inefficiency and risks of discontinuities at the boundaries between the output of individual patches. In this work we propose a simple alternative approach in which the input size of the CNN is dramatically increased only during label inference. We evaluate the performance of the proposed approach against a standard stitching approach using two popular segmentation CNN models on the INRIA building labeling dataset. The results suggest that the proposed approach substantially reduces label inference time, while also yielding modest overall label accuracy increases. This approach also contributed to our winning entry (overall performance) in the INRIA building labeling competition.