Skip to main content

Big, noisy data: how scalable Gaussian processes can leverage personal weather stations to improve spatiotemporal coverage of urban climate networks

Publication ,  Other
Calhoun, Z; Bergin, M; Carlson, D
May 21, 2025

Urban temperature varies dramatically across space and time, yet capturing this variability requires a dense, reliable sensor network—something that is rarely available in practice. Spatiotemporal gaps in data coverage make it difficult to connect localized urban heat stress to health outcomes and energy demand. In this work, we demonstrate how personal weather stations (PWSs) and machine learning can bridge these gaps to improve urban climate monitoring.To show this, we analyze PWS data collected in Durham County, North Carolina, from 2019 to 2024—a network of over 200 sensors recording hourly temperature data, totaling more than 15 million observations. This dataset presents two key sources of bias that must be addressed to ensure reliable urban heat estimates. First, it is preferentially sampled, with a higher density of weather stations in wealthier (and often cooler) neighborhoods. Second, faulty radiation shields on low-cost sensors may positively bias sensor measurements on sunny days.To address these challenges, we explore Gaussian Process Regression (GPR), a flexible machine learning technique that, when defined with a carefully designed covariance structure, can account for non-uniform sensor placement and measurement noise. However, exact GPR is computationally intractable for large spatiotemporal datasets (i.e., > 10,000 observations). To overcome this, we leverage the Variational Nearest Neighbor Gaussian Process (VNNGP), a scalable approximation that enables the application of complex covariance structures to arbitrarily large datasets.Our approach demonstrates that the VNNGP model allows for complex spatiotemporal dependencies to be learned, making them well-suited for urban temperature modeling. Additionally, we show that abundant but noisy PWS data, when integrated with these models, can further improve spatial coverage. Together, these advancements highlight how combining large, imperfect datasets with sophisticated modeling techniques can enhance urban climate monitoring, leading to better heat exposure assessments and more informed environmental policies.

Duke Scholars

DOI

Publication Date

May 21, 2025
 

DOI

Publication Date

May 21, 2025