Demographic reporting in biosignal datasets: a comprehensive analysis of the PhysioNet open access database.
The PhysioNet open access database (PND) is one of the world's largest and most comprehensive repositories of biosignal data and is widely used by researchers to develop, train, and validate algorithms. To contextualise the results of such algorithms, understanding the underlying demographic distribution of the data is crucial-specifically, the race, ethnicity, sex or gender, and age of study participants. We sought to understand the underlying reporting patterns and characteristics of the demographic data of the datasets available on PND. Of the 181 unique datasets present in the PND as of July 6, 2023, 175 involved human participants, with less than 7% of studies reporting on all four of the key demographic variables. Furthermore, we found a higher rate of reporting sex or gender and age than race and ethnicity. In the studies that did include participant sex or gender, the samples were mostly male. Additionally, we found that most studies were done in North America, particularly in the USA. These imbalances and poor reporting of representation raise concerns regarding potential embedded biases in the algorithms that rely on these datasets. They also underscore the need for universal and comprehensive reporting practices to ensure equitable development and deployment of artificial intelligence and machine learning tools in medicine.
Duke Scholars
Published In
DOI
EISSN
ISSN
Publication Date
Volume
Issue
Start / End Page
Related Subject Headings
- Racial Groups
- Male
- Humans
- Female
- Ethnicity
- Demography
- Databases, Factual
- Algorithms
- Adult
- Access to Information
Citation
Published In
DOI
EISSN
ISSN
Publication Date
Volume
Issue
Start / End Page
Related Subject Headings
- Racial Groups
- Male
- Humans
- Female
- Ethnicity
- Demography
- Databases, Factual
- Algorithms
- Adult
- Access to Information