Investigating the accuracy of blood oxygen saturation measurements in common consumer smartwatches.
Blood oxygen saturation (SpO2) is an important measurement for monitoring patients with acute and chronic conditions that are associated with low blood oxygen levels. While smartwatches may provide a new method for continuous and unobtrusive SpO2 monitoring, it is necessary to understand their accuracy and limitations to ensure that they are used in a fit-for-purpose manner. To determine whether the accuracy of and ability to take SpO2 measurements from consumer smartwatches is different by device type and/or by skin tone, our study recruited patients aged 18-85 years old, with and without chronic pulmonary disease, who were able to provide informed consent. The mean absolute error (MAE), mean directional error (MDE) and root mean squared error (RMSE) were used to evaluate the accuracy of the smartwatches as compared to a clinical grade pulse oximeter. The percent of data unobtainable due to inability of the smartwatch to record SpO2 (missingness) was used to evaluate the measurability of SpO2 from the smartwatches. Skin tones were quantified based on the Fitzpatrick (FP) scale and Individual Typology Angle (ITA), a continuous measure of skin tone. A total of 49 individuals (18 female) were enrolled and completed the study. Using a clinical-grade pulse oximeter as the reference standard, there were statistically significant differences in accuracy between devices, with Apple Watch Series 7 having measurements closest to the reference standard (MAE = 2.2%, MDE = -0.4%, RMSE = 2.9%) and the Garmin Venu 2s having measurements farthest from the reference standard (MAE = 5.8%, MDE = 5.5%, RMSE = 6.7%). There were also significant differences in measurability across devices, with the highest data presence from the Apple Watch Series 7 (88.9% of attempted measurements were successful) and the highest data missingness from the Withings ScanWatch (only 69.5% of attempted measurements were successful). The MAE, RMSE and missingness did not vary significantly across FP skin tone groups, however, there may be a relationship between FP skin tone and MDE (intercept = 0.04, beta coefficient = 0.47, p = 0.04). No statistically significant difference was found between skin tone as measured by ITA and MAE, MDE, RMSE or missingness.