Compilation of a Nationwide River Image Dataset for Identifying River Channels and River Rapids via Deep Learning
Highlights: What are the main findings? A new dataset of 281,024 river images from across the United States, with metadata and labeled subsets to support hydrologic research is made publicly available. Demonstrated strong performance of segmentation and classification models for detecting rivers and rapids, which could enable expansion of existing inventories for these key geomorphic features. What is the implication of the main finding? Establishes a hydrologic dataset that enables new machine learning approaches for characterizing rivers via remote sensing, including advanced river segmentation and detection of rapids. Provides a framework to support a range of hydrologic applications including discharge estimation, habitat assessment, resource management, and recreation planning. Remote sensing enables large-scale, image-based assessments of river dynamics, offering new opportunities for hydrological monitoring. We present a publicly available dataset consisting of 281,024 satellite and aerial images of U.S. rivers, constructed using an Application Programming Interface (API) and the U.S. Geological Survey’s National Hydrography Dataset. The dataset includes images, primary keys, and ancillary geospatial information. We use a manually labeled subset of the images to train models for detecting rapids, defined as areas where high velocity and turbulence lead to a wavy, rough, or even broken water surface visible in the imagery. To demonstrate the utility of this dataset, we develop an image segmentation model to identify rivers within images. This model achieved a mean test intersection-over-union ((Formula presented.)) of 0.57, with performance rising to an actual (Formula presented.) of 0.89 on the subset of predictions with high confidence (predicted (Formula presented.) > 0.9). Following this initial segmentation of river channels within the images, we trained several convolutional neural network (CNN) architectures to classify the presence or absence of rapids. Our selected model reached an accuracy and F1 score of 0.93, indicating strong performance for the classification of rapids that could support consistent, efficient inventory and monitoring of rapids. These data provide new resources for recreation planning, habitat assessment, and discharge estimation. Overall, the dataset and tools offer a foundation for scalable, automated identification of geomorphic features to support riverine science and resource management.
Duke Scholars
Published In
DOI
EISSN
Publication Date
Volume
Issue
Related Subject Headings
- 4013 Geomatic engineering
- 3709 Physical geography and environmental geoscience
- 3701 Atmospheric sciences
- 0909 Geomatic Engineering
- 0406 Physical Geography and Environmental Geoscience
- 0203 Classical Physics
Citation
Published In
DOI
EISSN
Publication Date
Volume
Issue
Related Subject Headings
- 4013 Geomatic engineering
- 3709 Physical geography and environmental geoscience
- 3701 Atmospheric sciences
- 0909 Geomatic Engineering
- 0406 Physical Geography and Environmental Geoscience
- 0203 Classical Physics