Grid Binary LOgistic REgression (GLORE): building shared models without sharing data.
OBJECTIVE: The classification of complex or rare patterns in clinical and genomic data requires the availability of a large, labeled patient set. While methods that operate on large, centralized data sources have been extensively used, little attention has been paid to understanding whether models such as binary logistic regression (LR) can be developed in a distributed manner, allowing researchers to share models without necessarily sharing patient data. MATERIAL AND METHODS: Instead of bringing data to a central repository for computation, we bring computation to the data. The Grid Binary LOgistic REgression (GLORE) model integrates decomposable partial elements or non-privacy sensitive prediction values to obtain model coefficients, the variance-covariance matrix, the goodness-of-fit test statistic, and the area under the receiver operating characteristic (ROC) curve. RESULTS: We conducted experiments on both simulated and clinically relevant data, and compared the computational costs of GLORE with those of a traditional LR model estimated using the combined data. We showed that our results are the same as those of LR to a 10(-15) precision. In addition, GLORE is computationally efficient. LIMITATION: In GLORE, the calculation of coefficient gradients must be synchronized at different sites, which involves some effort to ensure the integrity of communication. Ensuring that the predictors have the same format and meaning across the data sets is necessary. CONCLUSION: The results suggest that GLORE performs as well as LR and allows data to remain protected at their original sites.
Duke Scholars
Altmetric Attention Stats
Dimensions Citation Stats
Published In
DOI
EISSN
Publication Date
Volume
Issue
Start / End Page
Location
Related Subject Headings
- ROC Curve
- Pattern Recognition, Automated
- Medical Informatics
- Logistic Models
- Information Dissemination
- Humans
- Confidentiality
- Computer Simulation
- Biomedical Research
- Area Under Curve
Citation
Published In
DOI
EISSN
Publication Date
Volume
Issue
Start / End Page
Location
Related Subject Headings
- ROC Curve
- Pattern Recognition, Automated
- Medical Informatics
- Logistic Models
- Information Dissemination
- Humans
- Confidentiality
- Computer Simulation
- Biomedical Research
- Area Under Curve