Skip to main content

Data adequacy bias impact in a data-blinded semi-supervised GAN for privacy-Aware COVID-19 chest X-ray classification

Publication ,  Conference
Pastorino, J; Biswas, AK
Published in: Proceedings of the 13th ACM International Conference on Bioinformatics Computational Biology and Health Informatics Bcb 2022
August 7, 2022

Supervised machine learning models are, by definition, data-sighted, requiring to view all or most parts of the training dataset which are labeled. This paradigm presents two bottlenecks which are intertwined: risk of exposing sensitive data samples to the third-party site with machine learning engineers, and time-consuming, laborious, bias-prone nature of data annotations by the personnel at the data source site. In this paper we studied learning impact of data adequacy as bias source in a data-blinded semi-supervised learning model for covid chest X-ray classification. Data-blindedness was put in action on a semi-supervised generative adversarial network to generate synthetic data based only on a few labeled data samples and concurrently learn to classify targets. We designed and developed a data-blind COVID-19 patient classifier that classifies whether an individual is suffering from COVID-19 or other type of illness with the ultimate goal of producing a system to assist in labeling large datasets. However, the availability of the labels in the training data had an impact in the model performance, and when a new disease spreads, as it was COVID9-19 in 2019, access to labeled data may be limited. Here, we studied how bias in the labeled sample distribution per class impacted in classification performance for three models: A Convolution Neural Network based classifier (CNN), a semi-supervised GAN using the source data (SGAN), and finally our proposed data-blinded semi-supervised GAN (BSGAN). Data-blind prevents machine learning engineers from directly accessing the source data during training, thereby ensuring data confidentiality. This was achieved by using synthetic data samples, generated by a separate generative model which were then used to train the proposed model. Our model achieved comparable performance, with the trade-off between a privacy-Aware model and a traditionally-learnt model of 0.05 AUC-score, and it maintained stable, following the same learning performance as the data distribution was changed.

Duke Scholars

Published In

Proceedings of the 13th ACM International Conference on Bioinformatics Computational Biology and Health Informatics Bcb 2022

DOI

Publication Date

August 7, 2022
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Pastorino, J., & Biswas, A. K. (2022). Data adequacy bias impact in a data-blinded semi-supervised GAN for privacy-Aware COVID-19 chest X-ray classification. In Proceedings of the 13th ACM International Conference on Bioinformatics Computational Biology and Health Informatics Bcb 2022. https://doi.org/10.1145/3535508.3545560
Pastorino, J., and A. K. Biswas. “Data adequacy bias impact in a data-blinded semi-supervised GAN for privacy-Aware COVID-19 chest X-ray classification.” In Proceedings of the 13th ACM International Conference on Bioinformatics Computational Biology and Health Informatics Bcb 2022, 2022. https://doi.org/10.1145/3535508.3545560.
Pastorino J, Biswas AK. Data adequacy bias impact in a data-blinded semi-supervised GAN for privacy-Aware COVID-19 chest X-ray classification. In: Proceedings of the 13th ACM International Conference on Bioinformatics Computational Biology and Health Informatics Bcb 2022. 2022.
Pastorino, J., and A. K. Biswas. “Data adequacy bias impact in a data-blinded semi-supervised GAN for privacy-Aware COVID-19 chest X-ray classification.” Proceedings of the 13th ACM International Conference on Bioinformatics Computational Biology and Health Informatics Bcb 2022, 2022. Scopus, doi:10.1145/3535508.3545560.
Pastorino J, Biswas AK. Data adequacy bias impact in a data-blinded semi-supervised GAN for privacy-Aware COVID-19 chest X-ray classification. Proceedings of the 13th ACM International Conference on Bioinformatics Computational Biology and Health Informatics Bcb 2022. 2022.

Published In

Proceedings of the 13th ACM International Conference on Bioinformatics Computational Biology and Health Informatics Bcb 2022

DOI

Publication Date

August 7, 2022