Skip to main content

David Page

James B. Duke Distinguished Professor
Biostatistics & Bioinformatics, Division of Biostatistics
Duke Box 2721, Durham 27710
2424 Erwin Road Suite 1102, 11072 Hock Plaza, Durham 27705

Selected Publications


Variable Importance Matching for Causal Inference

Conference Proceedings of Machine Learning Research · January 1, 2023 Our goal is to produce methods for observational causal inference that are auditable, easy to troubleshoot, accurate for treatment effect estimation, and scalable to high-dimensional data. We describe a general framework called Model-to-Match that achieves ... Cite

A Framework for Automating Psychiatric Distress Screening in Ophthalmology Clinics Using an EHR-Derived AI Algorithm.

Journal Article Transl Vis Sci Technol · October 3, 2022 PURPOSE: In patients with ophthalmic disorders, psychosocial risk factors play an important role in morbidity and mortality. Proper and early psychiatric screening can result in prompt intervention and mitigate its impact. Because screening is resource int ... Full text Link to item Cite

A Subset of Secreted Proteins in Ascites Can Predict Platinum-Free Interval in Ovarian Cancer.

Journal Article Cancers (Basel) · September 1, 2022 The time between the last cycle of chemotherapy and recurrence, the platinum-free interval (PFI), predicts overall survival in high-grade serous ovarian cancer (HGSOC). To identify secreted proteins associated with a shorter PFI, we utilized machine learni ... Full text Link to item Cite

Advancing artificial intelligence-assisted pre-screening for fragile X syndrome.

Journal Article BMC Med Inform Decis Mak · June 10, 2022 BACKGROUND: Fragile X syndrome (FXS), the most common inherited cause of intellectual disability and autism, is significantly underdiagnosed in the general population. Diagnosing FXS is challenging due to the heterogeneity of the condition, subtle physical ... Full text Link to item Cite

Response to Timothé Ménard.

Journal Article Genet Med · March 2022 Full text Link to item Cite

Prevalence of Underdiagnosed Fragile X Syndrome in 2 Health Systems.

Journal Article JAMA Netw Open · December 1, 2021 This cross-sectional study examines the gap between best estimates of prevalence and clinical diagnosis of fragile X syndrome by mining the electronic health records of 3.8 million people in Wisconsin. ... Full text Link to item Cite

E-Pedigrees: a large-scale automatic family pedigree prediction application.

Journal Article Bioinformatics · November 5, 2021 MOTIVATION: The use and functionality of Electronic Health Records (EHR) have increased rapidly in the past few decades. EHRs are becoming an important depository of patient health information and can capture family data. Pedigree analysis is a longstandin ... Full text Link to item Cite

Machine learning approach to measurement of criticism: The core dimension of expressed emotion.

Journal Article J Fam Psychol · October 2021 Expressed emotion (EE), a measure of the family's emotional climate, is a fundamental measure in caregiving research. A core dimension of EE is the level of criticism expressed by the caregiver to the care recipient, with a high level of criticism a marker ... Full text Link to item Cite

Artificial intelligence-assisted phenotype discovery of fragile X syndrome in a population-based sample.

Journal Article Genet Med · July 2021 PURPOSE: Fragile X syndrome (FXS), the most prevalent inherited cause of intellectual disability, remains underdiagnosed in the general population. Clinical studies have shown that individuals with FXS have a complex health profile leading to unique clinic ... Full text Link to item Cite

Predicting Drug-Drug Interactions from Heterogeneous Data: An Embedding Approach

Conference Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2021 Most approaches for predicting drug-drug interactions (DDIs) have focused on text. We present the first work that uses multiple drug structure data - images, string representations and relationship representations. We exploit the recent advances in deep ne ... Full text Cite

Clinical applications of machine learning in the diagnosis, classification, and prediction of heart failure.

Journal Article Am Heart J · November 2020 Machine learning and artificial intelligence are generating significant attention in the scientific community and media. Such algorithms have great potential in medicine for personalizing and improving patient care, including in the diagnosis and managemen ... Full text Link to item Cite

Development and Validation of a Natural Language Processing Tool to Generate the CONSORT Reporting Checklist for Randomized Clinical Trials.

Journal Article JAMA Netw Open · October 1, 2020 IMPORTANCE: Adherence to the Consolidated Standards of Reporting Trials (CONSORT) for randomized clinical trials is associated with improvingquality because inadequate reporting in randomized clinical trials may complicate the interpretation and the applic ... Full text Link to item Cite

Adverse Drug Reaction Discovery from Electronic Health Records with Deep Neural Networks.

Conference Proc ACM Conf Health Inference Learn (2020) · April 2020 Adverse drug reactions (ADRs) are detrimental and unexpected clinical incidents caused by drug intake. The increasing availability of massive quantities of longitudinal event data such as electronic health records (EHRs) has redefined ADR discovery as a bi ... Full text Link to item Cite

Autoblock: A hands-off blocking framework for entity matching

Conference WSDM 2020 - Proceedings of the 13th International Conference on Web Search and Data Mining · January 20, 2020 Entity matching seeks to identify data records over one or multiple data sources that refer to the same real-world entity. Virtually every entity matching task on large datasets requires blocking, a step that reduces the number of record pairs to be matche ... Full text Cite

CAUSE: Learning granger causality from event sequences using attribution methods

Conference 37th International Conference on Machine Learning, ICML 2020 · January 1, 2020 We study the problem of learning Granger causality between event types from asynchronous, interdependent, multi-Type event sequences. Existing work suffers from either limited model flexibility or poor model explainability and thus fails to uncover Granger ... Cite

KinderMiner Web: a simple web tool for ranking pairwise associations in biomedical applications.

Journal Article F1000Res · 2020 Many important scientific discoveries require lengthy experimental processes of trial and error and could benefit from intelligent prioritization based on deep domain understanding. While exponential growth in the scientific literature makes it difficult t ... Full text Link to item Cite

Machine learning to predict developmental neurotoxicity with high-throughput data from 2D bio-engineered tissues.

Journal Article Proc Int Conf Mach Learn Appl · December 2019 There is a growing need for fast and accurate methods for testing developmental neurotoxicity across several chemical exposure sources. Current approaches, such as in vivo animal studies, and assays of animal and human primary cell cultures, suffer from ch ... Full text Open Access Link to item Cite

Tumor cell sensitivity to vemurafenib can be predicted from protein expression in a BRAF-V600E basket trial setting.

Journal Article BMC Cancer · October 31, 2019 BACKGROUND: Genetics-based basket trials have emerged to test targeted therapeutics across multiple cancer types. However, while vemurafenib is FDA-approved for BRAF-V600E melanomas, the non-melanoma basket trial was unsuccessful, suggesting mutation statu ... Full text Link to item Cite

Parathyroid hormone independently predicts fracture, vascular events, and death in patients with stage 3 and 4 chronic kidney disease.

Journal Article Osteoporos Int · October 2019 UNLABELLED: Doctors do not know whether treatment of high parathyroid hormone levels is linked to better outcomes in their patients with kidney disease. In this study, lower parathyroid hormone levels at baseline were linked to lower risk of fracture, vasc ... Full text Link to item Cite

Data-driven phenotype discovery of FMR1 premutation carriers in a population-based sample.

Journal Article Sci Adv · August 2019 The impact of the FMR1 premutation on human health is the subject of considerable controversy. A fundamental unanswered question is whether carrying the premutation allele is directly correlated with clinical phenotypes. A challenging problem in past genot ... Full text Link to item Cite

Variable Importance Matching for Causal Inference

Conference Proceedings of Machine Learning Research · January 1, 2023 Our goal is to produce methods for observational causal inference that are auditable, easy to troubleshoot, accurate for treatment effect estimation, and scalable to high-dimensional data. We describe a general framework called Model-to-Match that achieves ... Cite

A Framework for Automating Psychiatric Distress Screening in Ophthalmology Clinics Using an EHR-Derived AI Algorithm.

Journal Article Transl Vis Sci Technol · October 3, 2022 PURPOSE: In patients with ophthalmic disorders, psychosocial risk factors play an important role in morbidity and mortality. Proper and early psychiatric screening can result in prompt intervention and mitigate its impact. Because screening is resource int ... Full text Link to item Cite

A Subset of Secreted Proteins in Ascites Can Predict Platinum-Free Interval in Ovarian Cancer.

Journal Article Cancers (Basel) · September 1, 2022 The time between the last cycle of chemotherapy and recurrence, the platinum-free interval (PFI), predicts overall survival in high-grade serous ovarian cancer (HGSOC). To identify secreted proteins associated with a shorter PFI, we utilized machine learni ... Full text Link to item Cite

Advancing artificial intelligence-assisted pre-screening for fragile X syndrome.

Journal Article BMC Med Inform Decis Mak · June 10, 2022 BACKGROUND: Fragile X syndrome (FXS), the most common inherited cause of intellectual disability and autism, is significantly underdiagnosed in the general population. Diagnosing FXS is challenging due to the heterogeneity of the condition, subtle physical ... Full text Link to item Cite

Response to Timothé Ménard.

Journal Article Genet Med · March 2022 Full text Link to item Cite

Prevalence of Underdiagnosed Fragile X Syndrome in 2 Health Systems.

Journal Article JAMA Netw Open · December 1, 2021 This cross-sectional study examines the gap between best estimates of prevalence and clinical diagnosis of fragile X syndrome by mining the electronic health records of 3.8 million people in Wisconsin. ... Full text Link to item Cite

E-Pedigrees: a large-scale automatic family pedigree prediction application.

Journal Article Bioinformatics · November 5, 2021 MOTIVATION: The use and functionality of Electronic Health Records (EHR) have increased rapidly in the past few decades. EHRs are becoming an important depository of patient health information and can capture family data. Pedigree analysis is a longstandin ... Full text Link to item Cite

Machine learning approach to measurement of criticism: The core dimension of expressed emotion.

Journal Article J Fam Psychol · October 2021 Expressed emotion (EE), a measure of the family's emotional climate, is a fundamental measure in caregiving research. A core dimension of EE is the level of criticism expressed by the caregiver to the care recipient, with a high level of criticism a marker ... Full text Link to item Cite

Artificial intelligence-assisted phenotype discovery of fragile X syndrome in a population-based sample.

Journal Article Genet Med · July 2021 PURPOSE: Fragile X syndrome (FXS), the most prevalent inherited cause of intellectual disability, remains underdiagnosed in the general population. Clinical studies have shown that individuals with FXS have a complex health profile leading to unique clinic ... Full text Link to item Cite

Predicting Drug-Drug Interactions from Heterogeneous Data: An Embedding Approach

Conference Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2021 Most approaches for predicting drug-drug interactions (DDIs) have focused on text. We present the first work that uses multiple drug structure data - images, string representations and relationship representations. We exploit the recent advances in deep ne ... Full text Cite

Clinical applications of machine learning in the diagnosis, classification, and prediction of heart failure.

Journal Article Am Heart J · November 2020 Machine learning and artificial intelligence are generating significant attention in the scientific community and media. Such algorithms have great potential in medicine for personalizing and improving patient care, including in the diagnosis and managemen ... Full text Link to item Cite

Development and Validation of a Natural Language Processing Tool to Generate the CONSORT Reporting Checklist for Randomized Clinical Trials.

Journal Article JAMA Netw Open · October 1, 2020 IMPORTANCE: Adherence to the Consolidated Standards of Reporting Trials (CONSORT) for randomized clinical trials is associated with improvingquality because inadequate reporting in randomized clinical trials may complicate the interpretation and the applic ... Full text Link to item Cite

Adverse Drug Reaction Discovery from Electronic Health Records with Deep Neural Networks.

Conference Proc ACM Conf Health Inference Learn (2020) · April 2020 Adverse drug reactions (ADRs) are detrimental and unexpected clinical incidents caused by drug intake. The increasing availability of massive quantities of longitudinal event data such as electronic health records (EHRs) has redefined ADR discovery as a bi ... Full text Link to item Cite

Autoblock: A hands-off blocking framework for entity matching

Conference WSDM 2020 - Proceedings of the 13th International Conference on Web Search and Data Mining · January 20, 2020 Entity matching seeks to identify data records over one or multiple data sources that refer to the same real-world entity. Virtually every entity matching task on large datasets requires blocking, a step that reduces the number of record pairs to be matche ... Full text Cite

CAUSE: Learning granger causality from event sequences using attribution methods

Conference 37th International Conference on Machine Learning, ICML 2020 · January 1, 2020 We study the problem of learning Granger causality between event types from asynchronous, interdependent, multi-Type event sequences. Existing work suffers from either limited model flexibility or poor model explainability and thus fails to uncover Granger ... Cite

KinderMiner Web: a simple web tool for ranking pairwise associations in biomedical applications.

Journal Article F1000Res · 2020 Many important scientific discoveries require lengthy experimental processes of trial and error and could benefit from intelligent prioritization based on deep domain understanding. While exponential growth in the scientific literature makes it difficult t ... Full text Link to item Cite

Machine learning to predict developmental neurotoxicity with high-throughput data from 2D bio-engineered tissues.

Journal Article Proc Int Conf Mach Learn Appl · December 2019 There is a growing need for fast and accurate methods for testing developmental neurotoxicity across several chemical exposure sources. Current approaches, such as in vivo animal studies, and assays of animal and human primary cell cultures, suffer from ch ... Full text Open Access Link to item Cite

Tumor cell sensitivity to vemurafenib can be predicted from protein expression in a BRAF-V600E basket trial setting.

Journal Article BMC Cancer · October 31, 2019 BACKGROUND: Genetics-based basket trials have emerged to test targeted therapeutics across multiple cancer types. However, while vemurafenib is FDA-approved for BRAF-V600E melanomas, the non-melanoma basket trial was unsuccessful, suggesting mutation statu ... Full text Link to item Cite

Parathyroid hormone independently predicts fracture, vascular events, and death in patients with stage 3 and 4 chronic kidney disease.

Journal Article Osteoporos Int · October 2019 UNLABELLED: Doctors do not know whether treatment of high parathyroid hormone levels is linked to better outcomes in their patients with kidney disease. In this study, lower parathyroid hormone levels at baseline were linked to lower risk of fracture, vasc ... Full text Link to item Cite

Data-driven phenotype discovery of FMR1 premutation carriers in a population-based sample.

Journal Article Sci Adv · August 2019 The impact of the FMR1 premutation on human health is the subject of considerable controversy. A fundamental unanswered question is whether carrying the premutation allele is directly correlated with clinical phenotypes. A challenging problem in past genot ... Full text Link to item Cite

A Simple Text Mining Approach for Ranking Pairwise Associations in Biomedical Applications

Journal Article AMIA Summits on Translational Science Proceeding · July 26, 2019 Cite

Training and Interpreting Machine Learning Algorithms to Evaluate Fall Risk After Emergency Department Visits.

Journal Article Med Care · July 2019 BACKGROUND: Machine learning is increasingly used for risk stratification in health care. Achieving accurate predictive models do not improve outcomes if they cannot be translated into efficacious intervention. Here we examine the potential utility of auto ... Full text Link to item Cite

AUCµ: A Performance Metric for Multi-Class Machine Learning Models

Conference · July 1, 2019 The area under the receiver operating characteristic curve (AUC) is arguably the most common metric in machine learning for assessing the quality of a two-class classification model. As the number and complexity of machine learning applications grows, so t ... Link to item Cite

Machine learning for phenotyping opioid overdose events.

Journal Article J Biomed Inform · June 2019 OBJECTIVE: To develop machine learning models for classifying the severity of opioid overdose events from clinical data. MATERIALS AND METHODS: Opioid overdoses were identified by diagnoses codes from the Marshfield Clinic population and assigned a severit ... Full text Link to item Cite

A Machine-Learning-Based Drug Repurposing Approach Using Baseline Regularization.

Chapter · 2019 We present the baseline regularization model for computational drug repurposing using electronic health records (EHRs). In EHRs, drug prescriptions of various drugs are recorded throughout time for various patients. In the same time, numeric physical measu ... Full text Link to item Cite

AUCμ: A performance metric for multi-class machine learning models

Conference 36th International Conference on Machine Learning, ICML 2019 · January 1, 2019 The area under the receiver operating characteristic curve (AUC) is arguably the most common metric in machine learning for assessing the quality of a two-class classification model. As the number and complexity of machine learning applications grows, so t ... Cite

Recursive Feature Elimination by Sensitivity Testing.

Conference Proc Int Conf Mach Learn Appl · December 2018 There is great interest in methods to improve human insight into trained non-linear models. Leading approaches include producing a ranking of the most relevant features, a non-trivial task for non-linear models. We show theoretically and empirically the be ... Full text Link to item Cite

Drug-Drug Interaction Discovery: Kernel Learning from Heterogeneous Similarities.

Journal Article Smart Health (Amst) · December 2018 We develop a pipeline to mine complex drug interactions by combining different similarities and interaction types (molecular, structural, phenotypic, genomic etc). Our goal is to learn an optimal kernel from these heterogeneous similarities in a supervised ... Full text Link to item Cite

Privacy-Preserving Collaborative Prediction using Random Forests

Journal Article · November 21, 2018 We study the problem of privacy-preserving machine learning (PPML) for ensemble methods, focusing our effort on random forests. In collaborative analysis, PPML attempts to solve the conflict between the need for data sharing and privacy. This is especially ... Open Access Link to item Cite

Stochastic Learning for Sparse Discrete Markov Random Fields with Controlled Gradient Approximation Error.

Conference Uncertain Artif Intell · August 2018 We study the L 1-regularized maximum likelihood estimator/estimation (MLE) problemfor discrete Markov random fields (MRFs), where efficient and scalable learning requires both sparse regularization and approximate inference. To address these challenges, we ... Link to item Cite

Using machine learning to identify patterns of lifetime health problems in decedents with autism spectrum disorder.

Journal Article Autism Res · August 2018 Very little is known about the health problems experienced by individuals with autism spectrum disorder (ASD) throughout their life course. We retrospectively analyzed diagnostic codes associated with de-identified electronic health records using a machine ... Full text Link to item Cite

Comparative Evaluation of MS-based Metabolomics Software and Its Application to Preclinical Alzheimer's Disease.

Journal Article Sci Rep · June 18, 2018 Mass spectrometry-based metabolomics has undergone significant progresses in the past decade, with a variety of software packages being developed for data analysis. However, systematic comparison of different metabolomics software tools has rarely been con ... Full text Link to item Cite

Brand vs generic adverse event reporting patterns: An authorized generic-controlled evaluation of cardiovascular medications.

Journal Article J Clin Pharm Ther · June 2018 WHAT IS KNOWN AND OBJECTIVE: Some public scepticism exists about generics in terms of whether brand and generic drugs produce identical outcomes. This study explores whether adverse event (AE) reporting patterns are similar between brand and generic drugs, ... Full text Link to item Cite

Mixed Approach Retrospective Analyses of Suicide and Suicidal Ideation for Brand Compared with Generic Central Nervous System Drugs.

Journal Article Drug Saf · April 2018 INTRODUCTION: Several different types of drugs acting on the central nervous system (CNS) have previously been associated with an increased risk of suicide and suicidal ideation (broadly referred to as suicide). However, a differential association between ... Full text Link to item Cite

Applying family analyses to electronic health records to facilitate genetic research.

Journal Article Bioinformatics · February 15, 2018 MOTIVATION: Pedigree analysis is a longstanding and powerful approach to gain insight into the underlying genetic factors in human health, but identifying, recruiting and genotyping families can be difficult, time consuming and costly. Development of high ... Full text Link to item Cite

Quantifying predictive capability of electronic health records for the most harmful breast cancer.

Conference Proc SPIE Int Soc Opt Eng · February 2018 Improved prediction of the "most harmful" breast cancers that cause the most substantive morbidity and mortality would enable physicians to target more intense screening and preventive measures at those women who have the highest risk; however, such predic ... Full text Link to item Cite

Comparison of Outcomes Following a Switch From a Brand to an Authorized Versus Independent Generic Drug.

Journal Article Clin Pharmacol Ther · February 2018 Authorized generics are identical in formulation to brand drugs, manufactured by the brand company but marketed as a generic. Generics, marketed by generic manufacturers, are required to demonstrate pharmaceutical and bioequivalence to the brand drug, but ... Full text Link to item Cite

Health Profiles of Mosaic Versus Non-mosaic FMR1 Premutation Carrier Mothers of Children With Fragile X Syndrome.

Journal Article Front Genet · 2018 The FMR1 premutation is of increasing interest to the FXS community, as questions about a primary premutation phenotype warrant research attention. 100 FMR1 premutation carrier mothers (mean age = 58; 67-138 CGG repeats) of adults with fragile X syndrome w ... Full text Link to item Cite

Privacy-preserving ridge regression with only linearly-homomorphic encryption

Conference Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2018 Linear regression with 2-norm regularization (i.e., ridge regression) is an important statistical technique that models the relationship between some explanatory values and an outcome value using a linear function. In many applications (e.g., predictive mo ... Full text Cite

Improving breast cancer risk prediction by using demographic risk factors, abnormality features on mammograms and genetic variants.

Journal Article AMIA Annu Symp Proc · 2018 The predictive capability of combining demographic risk factors, germline genetic variants, and mammogram abnormality features for breast cancer risk prediction is poorly understood. We evaluated the predictive performance of combinations of demographic ri ... Link to item Cite

Machine Learning Algorithm Predicts Cardiac Resynchronization Therapy Outcomes: Lessons From the COMPANION Trial.

Journal Article Circ Arrhythm Electrophysiol · January 2018 BACKGROUND: Cardiac resynchronization therapy (CRT) reduces morbidity and mortality in heart failure patients with reduced left ventricular function and intraventricular conduction delay. However, individual outcomes vary significantly. This study sought t ... Full text Link to item Cite

Causal Structure Learning via Temporal Markov Networks

Conference Proceedings of Machine Learning Research · January 1, 2018 Learning the structure of a dynamic Bayesian network (DBN) is a common way of discovering causal relationships in time series data. However, the combinatorial nature of DBN structure learning limits the accuracy and scalability of DBN modeling. We propose ... Cite

Methodological Considerations for Comparison of Brand Versus Generic Versus Authorized Generic Adverse Event Reports in the US Food and Drug Administration Adverse Event Reporting System (FAERS).

Journal Article Clin Drug Investig · December 2017 BACKGROUND: The US Food and Drug Administration Adverse Event Reporting System (FAERS), a post-marketing safety database, can be used to differentiate brand versus generic safety signals. OBJECTIVE: To explore the methods for identifying and analyzing bran ... Full text Link to item Cite

Adverse drug event discovery using biomedical literature: A big data neural network adventure

Journal Article JMIR Medical Informatics · October 1, 2017 Background: The study of adverse drug events (ADEs) is a tenured topic in medical literature. In recent years, increasing numbers of scientific articles and health-related social media posts have been generated and shared daily, albeit with very limited us ... Full text Cite

Breast Cancer Risk Prediction Using Electronic Health Records

Conference Proceedings - 2017 IEEE International Conference on Healthcare Informatics, ICHI 2017 · September 8, 2017 Electronic health records (EHRs) represent an underused data source that has great research and clinical potential. Our goal was to quantify the value of EHRs in breast cancer risk prediction. We conducted a retrospective case-control study, gathering pati ... Full text Cite

Comparison of brand versus generic antiepileptic drug adverse event reporting rates in the U.S. Food and Drug Administration Adverse Event Reporting System (FAERS).

Journal Article Epilepsy Res · September 2017 OBJECTIVE: Despite the cost saving role of generic anti-epileptic drugs (AEDs), debate exists as to whether generic substitution of branded AEDs may lead to therapeutic failure and increased toxicity. This study compared adverse event (AE) reporting rates ... Full text Link to item Cite

Pharmacovigilance via Baseline Regularization with Large-Scale Longitudinal Observational Data.

Conference KDD · August 2017 Several prominent public health hazards [29] that occurred at the beginning of this century due to adverse drug events (ADEs) have raised international awareness of governments and industries about pharmacovigilance (PhV) [6,7], the science and activities ... Full text Link to item Cite

BigNN: An open-source big data toolkit focused on biomedical sentence classification

Conference Proceedings - 2017 IEEE International Conference on Big Data, Big Data 2017 · July 1, 2017 Every single day, a massive amount of text data is generated by different medical data sources, such as scientific literature, medical web pages, health-related social media, clinical notes, and drug reviews. Processing this wealth of data is indeed a daun ... Full text Cite

Identifying Parkinson's Patients: A Functional Gradient Boosting Approach.

Conference Artif Intell Med Conf Artif Intell Med (2005-) · June 2017 Parkinson's, a progressive neural disorder, is difficult to identify due to the hidden nature of the symptoms associated. We present a machine learning approach that uses a definite set of features obtained from the Parkinsons Progression Markers Initiativ ... Full text Link to item Cite

Identifying Parkinson's Patients: A functional Gradient Boosting Approach

Journal Article Artificial Intelligence in Medicine · June 2017 Cite

Markov Logic Networks for Adverse Drug Event Extraction from Text.

Journal Article Knowl Inf Syst · May 2017 Adverse drug events (ADEs) are a major concern and point of emphasis for the medical profession, government, and society. A diverse set of techniques from epidemiology, statistics, and computer science are being proposed and studied for ADE discovery from ... Full text Link to item Cite

Comparison of Generic-to-Brand Switchback Rates Between Generic and Authorized Generic Drugs.

Journal Article Pharmacotherapy · April 2017 STUDY OBJECTIVE: Generic drugs contain identical active ingredients as their corresponding brand drugs and are pharmaceutically equivalent and bioequivalent, whereas authorized generic drugs (AGs) contain both identical active and inactive ingredients as t ... Full text Link to item Cite

An Efficient Pseudo-likelihood Method for Sparse Binary Pairwise Markov Network Estimation

Journal Article · February 27, 2017 The pseudo-likelihood method is one of the most popular algorithms for learning sparse binary pairwise Markov networks. In this paper, we formulate the $L_1$ regularized pseudo-likelihood problem as a sparse multiple logistic regression problem. In this wa ... Open Access Link to item Cite

A screening rule for ℓ1-regularized ising model estimation

Conference Advances in Neural Information Processing Systems · January 1, 2017 We discover a screening rule for ℓ1-regularized Ising model estimation. The simple closed-form screening rule is a necessary and sufficient condition for exactly recovering the blockwise structure of a solution under any given regularization parameters. Wi ... Cite

Privacy-Preserving Ridge Regression on Distributed Data.

Journal Article IACR Cryptol. ePrint Arch. · 2017 Cite

Structure-Leveraged Methods in Breast Cancer Risk Prediction.

Journal Article J Mach Learn Res · December 2016 Predicting breast cancer risk has long been a goal of medical research in the pursuit of precision medicine. The goal of this study is to develop novel penalized methods to improve breast cancer risk prediction by leveraging structure information in electr ... Link to item Cite

Multiple testing under dependence via graphical models

Journal Article Annals of Applied Statistics · September 1, 2016 Large-scale multiple testing tasks often exhibit dependence. Leveraging the dependence between individual tests is still one challenging and important problem in statistics. With recent advances in graphical models, it is feasible to use them to capture th ... Full text Cite

In-Depth Characterization and Validation of Human Urine Metabolomes Reveal Novel Metabolic Signatures of Lower Urinary Tract Symptoms.

Journal Article Sci Rep · August 9, 2016 Lower urinary tract symptoms (LUTS) are a range of irritative or obstructive symptoms that commonly afflict aging population. The diagnosis is mostly based on patient-reported symptoms, and current medication often fails to completely eliminate these sympt ... Full text Link to item Cite

Computational Drug Repositioning Using Continuous Self-Controlled Case Series.

Conference KDD · August 2016 Computational Drug Repositioning (CDR) is the task of discovering potential new indications for existing drugs by mining large-scale heterogeneous drug-related data sources. Leveraging the patient-level temporal ordering information between numeric physiol ... Full text Link to item Cite

Baseline Regularization for Computational Drug Repositioning with Longitudinal Observational Data.

Conference IJCAI (U S) · July 2016 Computational Drug Repositioning (CDR) is the knowledge discovery process of finding new indications for existing drugs leveraging heterogeneous drug-related data. Longitudinal observational data such as Electronic Health Records (EHRs) have become an emer ... Link to item Cite

Structure-leveraged methods in breast cancer risk prediction

Journal Article Journal of Machine Learning Research · May 1, 2016 ©2016 Jun Fan, Yirong Wu, Ming Yuan, David Page, Jie Liu, Irene M. Ong, Peggy Peissig and Elizabeth Burnside. Predicting breast cancer risk has long been a goal of medical research in the pursuit of precision medicine. The goal of this study is to develop ... Cite

Structure-leveraged methods in breast cancer risk prediction

Journal Article Journal of Machine Learning Research · May 1, 2016 Predicting breast cancer risk has long been a goal of medical research in the pursuit of precision medicine. The goal of this study is to develop novel penalized methods to improve breast cancer risk prediction by leveraging structure information in electr ... Cite

Relational learning for sustainable health

Chapter · January 1, 2016 Sustainable healthcare is a global need and requires better value–better health–for patients at lower cost. Predictive models have the opportunity to greatly increase value without increasing cost. Concrete examples include reducing heart attacks and reduc ... Full text Cite

Comparing Mammography Abnormality Features to Genetic Variants in the Prediction of Breast Cancer in Women Recommended for Breast Biopsy.

Journal Article Acad Radiol · January 2016 RATIONALE AND OBJECTIVES: The discovery of germline genetic variants associated with breast cancer has engendered interest in risk stratification for improved, targeted detection and diagnosis. However, there has yet to be a comparison of the predictive ab ... Full text Link to item Cite

Big data in healthcare: Opportunities and challenges

Journal Article Big Data · December 1, 2015 Full text Cite

Subsampled exponential mechanism: Differential privacy in large output spaces

Conference AISec 2015 - Proceedings of the 8th ACM Workshop on Artificial Intelligence and Security, co-located with CCS 2015 · October 16, 2015 In the last several years, differential privacy has become the leading framework for private data analysis. It provides bounds on the amount that a randomized function can change as the result of a modification to one record of a database. This requirement ... Full text Cite

Differential privacy for classifier evaluation

Conference AISec 2015 - Proceedings of the 8th ACM Workshop on Artificial Intelligence and Security, co-located with CCS 2015 · October 16, 2015 Differential privacy provides powerful guarantees that individuals incur minimal additional risk by including their personal data in a database. Most work in differential privacy has focused on differentially private algorithms that produce models, counts, ... Full text Cite

Human pluripotent stem cell-derived neural constructs for predicting neural toxicity.

Journal Article Proc Natl Acad Sci U S A · October 6, 2015 Human pluripotent stem cell-based in vitro models that reflect human physiology have the potential to reduce the number of drug failures in clinical trials and offer a cost-effective approach for assessing chemical safety. Here, human embryonic stem (ES) c ... Full text Link to item Cite

Developing a utility decision framework to evaluate predictive models in breast cancer risk estimation

Journal Article Journal of Medical Imaging · October 1, 2015 Combining imaging and genetic information to predict disease presence and progression is being codified into an emerging discipline called "radiogenomics." Optimal evaluation methodologies for radiogenomics have not been well established. We aim to develop ... Full text Cite

Learning to reject sequential importance steps for continuous-time Bayesian networks

Conference Proceedings of the National Conference on Artificial Intelligence · June 1, 2015 Applications of graphical models often require the use of approximate inference, such as sequential importance sampling (SIS), for estimation of the model distribution given partial evidence, i.e., the target distribution. However, when SIS proposal and ta ... Cite

Phenome-wide association studies (PheWASs) for functional variants.

Journal Article Eur J Hum Genet · April 2015 The genome-wide association study (GWAS) is a powerful approach for studying the genetic complexities of human disease. Unfortunately, GWASs often fail to identify clinically significant associations and describing function can be a challenge. GWAS is a ph ... Full text Link to item Cite

Sparse modeling of spatial environmental variables associated with asthma

Journal Article Journal of Biomedical Informatics · February 2015 Full text Cite

Machine Learning for Treatment Assignment: Improving Individualized Risk Attribution.

Journal Article AMIA Annu Symp Proc · 2015 Clinical studies model the average treatment effect (ATE), but apply this population-level effect to future individuals. Due to recent developments of machine learning algorithms with useful statistical guarantees, we argue instead for modeling the individ ... Link to item Cite

Predicting adverse drug events from electronic medical records

Journal Article Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2015 Learning from electronic medical records (EMR) poses many challenges from a knowledge representation point of view. This chapter focuses on how to cope with two specific challenges: the relational nature of EMRs and the uncertain dependence between a patie ... Full text Cite

Extracting Adverse Drug Events from Text using Human Advice.

Conference Artif Intell Med Conf Artif Intell Med (2005-) · 2015 Adverse drug events (ADEs) are a major concern and point of emphasis for the medical profession, government, and society in general. When methods extract ADEs from observational data, there is a necessity to evaluate these methods. More precisely, it is im ... Full text Link to item Cite

Developing a clinical utility framework to evaluate prediction models in radiogenomics

Conference Progress in Biomedical Optics and Imaging - Proceedings of SPIE · January 1, 2015 Combining imaging and genetic information to predict disease presence and behavior is being codified into an emerging discipline called "radiogenomics." Optimal evaluation methodologies for radiogenomics techniques have not been established. We aim to deve ... Full text Cite

Relational machine learning for electronic health record-driven phenotyping.

Journal Article J Biomed Inform · December 2014 OBJECTIVE: Electronic health records (EHR) offer medical and pharmacogenomics research unprecedented opportunities to identify and classify patients at risk. EHRs are collections of highly inter-dependent records that include biological, anatomical, physio ... Full text Link to item Cite

QuickFOIL: Scalable inductive logic programming

Conference Proceedings of the VLDB Endowment · November 1, 2014 learning technique that learns first-order rules from relationalstructured data. However, to-date most ILP systems can only be applied to small datasets (tens of thousands of examples). A long-standing challenge in the field is to scale ILP methods to larg ... Full text Cite

Cone structure in subjects with known genetic relative risk for AMD.

Journal Article Optom Vis Sci · August 2014 PURPOSE: Utilize high-resolution imaging to examine retinal anatomy in patients with known genetic relative risk (RR) for developing age-related macular degeneration (AMD). METHODS: Forty asymptomatic subjects were recruited (9 men, 31 women; age range, 51 ... Full text Link to item Cite

Prediction of insemination outcomes in Holstein dairy cattle using alternative machine learning algorithms.

Journal Article J Dairy Sci · February 2014 When making the decision about whether or not to breed a given cow, knowledge about the expected outcome would have an economic impact on profitability of the breeding program and net income of the farm. The outcome of each breeding can be affected by many ... Full text Link to item Cite

Childhood asthma clusters and response to therapy in clinical trials.

Journal Article J Allergy Clin Immunol · February 2014 BACKGROUND: Childhood asthma clusters, or subclasses, have been developed by computational methods without evaluation of clinical utility. OBJECTIVE: To replicate and determine whether childhood asthma clusters previously identified computationally in the ... Full text Link to item Cite

Comparing the value of mammographic features and genetic variants in breast cancer risk prediction.

Journal Article AMIA Annu Symp Proc · 2014 The goal of this study was to compare the value of mammographic features and genetic variants for breast cancer risk prediction with Bayesian reasoning and information theory. We conducted a retrospective case-control study, collecting mammographic finding ... Link to item Cite

Learning Heterogeneous Hidden Markov Random Fields.

Conference JMLR Workshop Conf Proc · 2014 Hidden Markov random fields (HMRFs) are conventionally assumed to be homogeneous in the sense that the potential functions are invariant across different sites. However in some biological applications, it is desirable to make HMRFs heterogeneous, especiall ... Link to item Cite

Multiple testing under dependence via semiparametric graphical models

Conference 31st International Conference on Machine Learning, ICML 2014 · January 1, 2014 It has been shown that graphical models can be used to leverage the dependence in large- scale multiple testing problems with significantly improved performance (Sun & Cai, 2009; Liu et al., 2012). These graphical models are fully parametric and require th ... Cite

On differentially private inductive logic programming

Journal Article Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2014 We consider differentially private inductive logic programming. We begin by formulating the problem of guarantee differential privacy to inductive logic programming, and then prove the theoretical difficulty of simultaneously providing good utility and goo ... Full text Cite

Support Vector Machines for Differential Prediction.

Conference Mach Learn Knowl Discov Databases · 2014 Machine learning is continually being applied to a growing set of fields, including the social sciences, business, and medicine. Some fields present problems that are not easily addressed using standard machine learning approaches and, in particular, there ... Full text Link to item Cite

New Genetic Variants Improve Personalized Breast Cancer Diagnosis

Journal Article Amia Summits on Translational Science Proceedings · 2014 Cite

Privacy in pharmacogenetics: An end-to-end case study of personalized warfarin dosing

Conference Proceedings of the 23rd USENIX Security Symposium · January 1, 2014 We initiate the study of privacy in pharmacogenetics, wherein machine learning models are used to guide medical treatments based on a patient's genotype and background. Performing an in-depth case study on privacy in personalized warfarin dosing, we show t ... Cite

Bayesian Estimation of Latently-grouped Parameters in Undirected Graphical Models.

Conference Adv Neural Inf Process Syst · December 5, 2013 In large-scale applications of undirected graphical models, such as social networks and biological networks, similar patterns occur frequently and give rise to similar parameters. In this situation, it is beneficial to group the parameters for more efficie ... Link to item Cite

Forest-based point process for event prediction from electronic health records

Conference Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · October 31, 2013 Accurate prediction of future onset of disease from Electronic Health Records (EHRs) has important clinical and economic implications. In this domain the arrival of data comes at semi-irregular intervals and makes the prediction task challenging. We propos ... Full text Cite

Area under the precision-recall curve: Point estimates and confidence intervals

Conference Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · October 31, 2013 The area under the precision-recall curve (AUCPR) is a single number summary of the information in the precision-recall (PR) curve. Similar to the receiver operating characteristic curve, the PR curve has its own unique properties that make estimating its ... Full text Cite

Random Forests approach for identifying additive and epistatic single nucleotide polymorphisms associated with residual feed intake in dairy cattle.

Journal Article J Dairy Sci · October 2013 Feed efficiency is an economically important trait in the beef and dairy cattle industries. Residual feed intake (RFI) is a measure of partial efficiency that is independent of production level per unit of body weight. The objective of this study was to id ... Full text Link to item Cite

Assessment of genetic and nongenetic interactions for the prediction of depressive symptomatology: an analysis of the Wisconsin Longitudinal Study using machine learning algorithms.

Journal Article Am J Public Health · October 2013 OBJECTIVES: We examined depression within a multidimensional framework consisting of genetic, environmental, and sociobehavioral factors and, using machine learning algorithms, explored interactions among these factors that might better explain the etiolog ... Full text Link to item Cite

A PheWAS approach in studying HLA-DRB1*1501.

Journal Article Genes Immun · April 2013 HLA-DRB1 codes for a major histocompatibility complex class II cell surface receptor. Genetic variants in and around this gene have been linked to numerous autoimmune diseases. Most notably, an association between HLA-DRB1*1501 haplotype and multiple scler ... Full text Link to item Cite

Evaluation of the modified asthma predictive index in high-risk preschool children.

Journal Article J Allergy Clin Immunol Pract · March 2013 BACKGROUND: Prediction of subsequent school-age asthma during the preschool years has proven challenging. OBJECTIVE: To confirm in a post hoc analysis the predictive ability of the modified Asthma Predictive Index (mAPI) ina high-risk cohort and a theoreti ... Full text Link to item Cite

Genetic variants improve breast cancer risk prediction on mammograms.

Journal Article AMIA Annu Symp Proc · 2013 Several recent genome-wide association studies have identified genetic variants associated with breast cancer. However, how much these genetic variants may help advance breast cancer risk prediction based on other clinical features, like mammographic findi ... Link to item Cite

A preliminary investigation into predictive models for adverse drug events

Conference AAAI Workshop - Technical Report · January 1, 2013 Adverse drug events are a leading cause of danger and cost in health care. We could reduce both the danger and the cost if we had accurate models to predict, at prescription time for each drug, which patients are most at risk for known adverse reactions to ... Cite

Score As You Lift (SAYL): A Statistical Relational Learning Approach to Uplift Modeling.

Conference Mach Learn Knowl Discov Databases · 2013 We introduce Score As You Lift (SAYL), a novel Statistical Relational Learning (SRL) algorithm, and apply it to an important task in the diagnosis of breast cancer. SAYL combines SRL with the marketing concept of uplift modeling, uses the area under the up ... Full text Link to item Cite

Learning when to reject an importance sample

Conference AAAI Workshop - Technical Report · January 1, 2013 When observations are incomplete or data are missing, approximate inference methods based on importance sampling are often used. Unfortunately, when the target and proposal distributions are dissimilar, the sampling procedure leads to biased estimates or r ... Cite

A human pluripotent stem cell platform for assessing developmental neural toxicity screening.

Journal Article Stem Cell Res Ther · 2013 A lack of affordable and effective testing and screening procedures mean surprisingly little is known about the health hazards of many of the tens of thousands of chemicals in use in the world today. The recent rise in the number of children affected by ne ... Full text Link to item Cite

Multiplicative forests for continuous-time processes

Conference Advances in Neural Information Processing Systems · December 1, 2012 Learning temporal dependencies between variables over continuous time is an important and challenging task. Continuous-time Bayesian networks effectively model such processes but are limited by the number of conditional intensity matrices, which grows expo ... Cite

Unachievable Region in Precision-Recall Space and Its Effect on Empirical Evaluation.

Journal Article Proc Int Conf Mach Learn · December 1, 2012 Precision-recall (PR) curves and the areas under them are widely used to summarize machine learning results, especially for data sets exhibiting class skew. They are often used analogously to ROC curves and the area under ROC curves. It is known that PR cu ... Open Access Link to item Cite

A collective ranking method for genome-wide association studies

Conference 2012 ACM Conference on Bioinformatics, Computational Biology and Biomedicine, BCB 2012 · November 26, 2012 Genome-wide association studies (GWAS) analyze genetic variation (SNPs) across the entire human genome, searching for SNPs that are associated with certain phenotypes, most often diseases, such as breast cancer. In GWAS, we seek a ranking of SNPs in terms ... Full text Cite

Identifying adverse drug events by relational learning

Conference Proceedings of the National Conference on Artificial Intelligence · November 7, 2012 The pharmaceutical industry, consumer protection groups, users of medications and government oversight agencies are all strongly interested in identifying adverse reactions to drugs. While a clinical trial of a drug may use only a thousand patients, once a ... Cite

Unachievable region in precision-recall space and its effect on empirical evaluation

Conference Proceedings of the 29th International Conference on Machine Learning, ICML 2012 · October 10, 2012 Precision-recall (PR) curves and the areas under them are widely used to summarize machine learning results, especially for data sets exhibiting class skew. They are often used analogously to ROC curves and the area under ROC curves. It is known that PR cu ... Cite

Relational differential prediction

Conference Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · October 4, 2012 A typical classification problem involves building a model to correctly segregate instances of two or more classes. Such a model exhibits differential prediction with respect to given data subsets when its performance is significantly different over these ... Full text Cite

Automated identification of protein-ligand interaction features using Inductive Logic Programming: a hexose binding case study.

Journal Article BMC Bioinformatics · July 11, 2012 BACKGROUND: There is a need for automated methods to learn general features of the interactions of a ligand class with its diverse set of protein receptors. An appropriate machine learning approach is Inductive Logic Programming (ILP), which automatically ... Full text Link to item Cite

High-Dimensional Structured Feature Screening Using Binary Markov Random Fields.

Conference JMLR Workshop Conf Proc · 2012 Feature screening is a useful feature selection approach for high-dimensional data when the goal is to identify all the features relevant to the response variable. However, common feature screening methods do not take into account the correlation structure ... Link to item Cite

Graphical-model Based Multiple Testing under Dependence, with Applications to Genome-wide Association Studies.

Conference Uncertain Artif Intell · 2012 Large-scale multiple testing tasks often exhibit dependence, and leveraging the dependence between individual tests is still one challenging and important problem in statistics. With recent advances in graphical models, it is feasible to use them to perfor ... Link to item Cite

Logical Differential Prediction Bayes Net, improving breast cancer diagnosis for older women.

Journal Article AMIA Annu Symp Proc · 2012 Overdiagnosis is a phenomenon in which screening identities cancer which may not go on to cause symptoms or death. Women over 65 who develop breast cancer bear the heaviest burden of overdiagnosis. This work introduces novel machine learning algorithms to ... Link to item Cite

Statistical Relational Learning to Predict Primary Myocardial Infarction from Electronic Health Records.

Conference Proc Innov Appl Artif Intell Conf · 2012 Electronic health records (EHRs) are an emerging relational domain with large potential to improve clinical outcomes. We apply two statistical relational learning (SRL) algorithms to the task of predicting primary myocardial infarction. We show that one SR ... Link to item Cite

Machine learning for personalized medicine: Predicting primary myocardial infarction from electronic health records

Journal Article AI Magazine · January 1, 2012 Electronic health records (EHRs) are an emerging relational domain with large potential to improve clinical outcomes. We apply two statistical relational learning (SRL) algorithms to the task of predicting primary myocardial infarction. We show that one SR ... Full text Cite

Demand-Driven Clustering in Relational Domains for Predicting Adverse Drug Events.

Conference Proc Int Conf Mach Learn · 2012 Learning from electronic medical records (EMR) is challenging due to their relational nature and the uncertain dependence between a patient's past and future health status. Statistical relational learning is a natural fit for analyzing EMRs but is less ade ... Link to item Cite

Predicting atrial fibrillation and flutter using electronic health records.

Conference Annu Int Conf IEEE Eng Med Biol Soc · 2012 Electronic Health Records (EHR) contain large amounts of useful information that could potentially be used for building models for predicting onset of diseases. In this study, we have investigated the use of free-text and coded data in Marshfield Clinic's ... Full text Link to item Cite

Extracting BI-RADS Features from Portuguese Clinical Texts.

Journal Article Proceedings (IEEE Int Conf Bioinformatics Biomed) · 2012 In this work we build the first BI-RADS parser for Portuguese free texts, modeled after existing approaches to extract BI-RADS features from English medical records. Our concept finder uses a semantic grammar based on the BIRADS lexicon and on iterative tr ... Full text Link to item Cite

Identifying Adverse Drug Events by Relational Learning

Conference Proceedings of the 26th AAAI Conference on Artificial Intelligence, AAAI 2012 · January 1, 2012 The pharmaceutical industry, consumer protection groups, users of medications and government oversight agencies are all strongly interested in identifying adverse reactions to drugs. While a clinical trial of a drug may use only a thousand patients, once a ... Cite

Integrating knowledge capture and supervised learning through a human-computer interface

Conference KCAP 2011 - Proceedings of the 2011 Knowledge Capture Conference · July 18, 2011 Some supervised-learning algorithms can make effective use of domain knowledge in addition to the input-output pairs commonly used in machine learning. However, formulating this additional information often requires an in-depth understanding of the specifi ... Full text Cite

Automating the ILP setup task: Converting user advice about specific examples into general background knowledge

Conference Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · June 23, 2011 Inductive Logic Programming (ILP) provides an effective method of learning logical theories given a set of positive examples, a set of negative examples, a corpus of background knowledge, and specification of a search space (e.g., via mode definitions) fro ... Full text Cite

Integrating machine learning and physician knowledge to improve the accuracy of breast biopsy.

Journal Article AMIA Annu Symp Proc · 2011 In this work we show that combining physician rules and machine learned rules may improve the performance of a classifier that predicts whether a breast cancer is missed on percutaneous, image-guided breast core needle biopsy (subsequently referred to as " ... Link to item Cite

Uncovering age-specific invasive and DCIS breast cancer rules using inductive logic programming

Conference IHI'10 - Proceedings of the 1st ACM International Health Informatics Symposium · December 1, 2010 Breast cancer is the most common type of cancer among women. Current clinical breast cancer diagnosis involves a biopsy, which is a costly, invasive and potentially painful procedure. Some researchers proposed models, based on mammography features and pers ... Full text Cite

Validation of results from knowledge discovery: mass density as a predictor of breast cancer.

Journal Article J Digit Imaging · October 2010 The purpose of our study is to identify and quantify the association between high breast mass density and breast malignancy using inductive logic programming (ILP) and conditional probabilities, and validate this association in an independent dataset. We r ... Full text Link to item Cite

An Inductive Logic Programming Approach to Validate Hexose Binding Biochemical Knowledge.

Conference Inductive Log Program · 2010 Hexoses are simple sugars that play a key role in many cellular pathways, and in the regulation of development and disease mechanisms. Current protein-sugar computational models are based, at least partially, on prior biochemical findings and knowledge. Th ... Full text Link to item Cite

Exploiting product distributions to identify relevant variables of correlation immune functions

Journal Article Journal of Machine Learning Research · November 30, 2009 A Boolean function f is correlation immune if each input variable is independent of the output, under the uniform distribution on inputs. For example, the parity function is correlation immune. We consider the problem of identifying relevant variables of a ... Cite

Probabilistic computer model developed from clinical data in national mammography database format to classify mammographic findings.

Journal Article Radiology · June 2009 PURPOSE: To determine whether a Bayesian network trained on a large database of patient demographic risk factors and radiologist-observed findings from consecutive clinical mammography examinations can exceed radiologist performance in the classification o ... Full text Link to item Cite

Estimation of the warfarin dose with clinical and pharmacogenetic data.

Journal Article N Engl J Med · February 19, 2009 BACKGROUND: Genetic variability among patients plays an important role in determining the dose of warfarin that should be used when oral anticoagulation is initiated, but practical methods of using genetic information have not been evaluated in a diverse a ... Full text Link to item Cite

Prion disease diagnosis by proteomic profiling.

Journal Article J Proteome Res · February 2009 Definitive prion disease diagnosis is currently limited to postmortem assay for the presence of the disease-associated proteinase K-resistant prion protein. Using cerebrospinal fluid (CSF) from prion-infected hamsters, matrix-assisted laser desorption/ioni ... Full text Link to item Cite

Information Extraction for Clinical Data Mining: A Mammography Case Study.

Conference Proc IEEE Int Conf Data Min · 2009 Breast cancer is the leading cause of cancer mortality in women between the ages of 15 and 54. During mammography screening, radiologists use a strict lexicon (BI-RADS) to describe and report their findings. Mammography records are then stored in a well-de ... Full text Link to item Cite

Combining gene expression, demographic and clinical data in modeling disease: a case study of bipolar disorder and schizophrenia.

Journal Article BMC Genomics · November 7, 2008 BACKGROUND: This paper presents a retrospective statistical study on the newly-released data set by the Stanley Neuropathology Consortium on gene expression in bipolar disorder and schizophrenia. This data set contains gene expression data as well as limit ... Full text Link to item Cite

Matching isotopic distributions from metabolically labeled samples.

Journal Article Bioinformatics · July 1, 2008 MOTIVATION: In recent years stable isotopic labeling has become a standard approach for quantitative proteomic analyses. Among the many available isotopic labeling strategies, metabolic labeling is attractive for the excellent internal control it provides. ... Full text Link to item Cite

CLP(BN): Constraint logic programming for probabilistic knowledge

Journal Article Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · March 10, 2008 In Datalog, missing values are represented by Skolem constants. More generally, in logic programming missing values, or existentially quantified variables, are represented by terms built from Skolem functors. The CLP( ) language represents the joint probab ... Full text Cite

Combining MALDI-FTMS and bioinformatics for rapid peptidomic comparisons.

Journal Article J Proteome Res · March 2008 Increasing research efforts in large-scale mass spectral analyses of peptides and proteins have led to many advances in technology and method development for collecting data and improving the quality of data. However, the resultant large data sets often po ... Full text Link to item Cite

Change of representation for statistical relational learning

Conference IJCAI International Joint Conference on Artificial Intelligence · December 1, 2007 Statistical relational learning (SRL) algorithms learn statistical models from relational data, such as that stored in a relational database. We previously introduced view learning for SRL, in which the view of a relational database can be automatically mo ... Cite

Learning Bayesian network structure from correlation-immune data

Conference Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence, UAI 2007 · December 1, 2007 Searching the complete space of possible Bayesian networks is intractable for problems of interesting size, so Bayesian network structure learning algorithms, such as the commonly used Sparse Candidate algorithm, employ heuristics. However, these heuristic ... Cite

An automated decision-tree approach to predicting protein interaction hot spots.

Journal Article Proteins · September 1, 2007 Protein-protein interactions can be altered by mutating one or more "hot spots," the subset of residues that account for most of the interface's binding free energy. The identification of hot spots requires a significant experimental effort, highlighting t ... Full text Link to item Cite

An integrated approach to feature invention and model construction for drug activity prediction

Conference ACM International Conference Proceeding Series · August 23, 2007 We present a new machine learning approach for 3D-QSAR, the task of predicting binding affinities of molecules to target proteins based on 3D structure. Our approach predicts binding affinity by using regression on substructures discovered by relational le ... Full text Cite

Using dynamic programming to create isotopic distribution maps from mass spectra.

Conference Bioinformatics · July 1, 2007 MOTIVATION: This article presents a method to identify the isotopic distributions within a mass spectrum using a probabilistic classifier supplemented with dynamic programming. Such a system is needed for a variety of purposes, including generating robust ... Full text Link to item Cite

ILP through propositionalization and stochastic k-term DNF learning

Conference Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2007 One promising family of search strategies to alleviate runtime and storage requirements of ILP systems is that of stochastic local search methods, which have been successfully applied to hard propositional tasks such as satisfiability. Stochastic local sea ... Full text Cite

Inferring regulatory networks from time series expression data and relational data via inductive logic programming

Conference Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2007 Determining the underlying regulatory mechanism of genetic networks is one of the central challenges of computational biology. Numerous methods have been developed and applied to the important but complex task of reverse engineering regulatory networks fro ... Full text Cite

Biomedical informatics training at the University of Wisconsin-Madison.

Journal Article Yearb Med Inform · 2007 OBJECTIVES: The purpose of this paper is to describe biomedical informatics training at the University of Wisconsin-Madison (UW-Madison). METHODS: We reviewed biomedical informatics training, research, and faculty/trainee participation at UW-Madison. RESUL ... Link to item Cite

Quantitative pharmacophore models with inductive logic programming

Conference Machine Learning · September 1, 2006 Three-dimensional models, or pharmacophores, describing Euclidean constraints on the location on small molecules of functional groups (like hydrophobic groups, hydrogen acceptors and donors, etc.), are often used in drug design to describe the medicinal ac ... Full text Cite

Randomised restarted search in ILP

Conference Machine Learning · September 1, 2006 Recent statistical performance studies of search algorithms in difficult combinatorial problems have demonstrated the benefits of randomising and restarting the search procedure. Specifically, it has been found that if the search cost distribution of the n ... Full text Cite

Rocephin--the thin end of the wedge.

Journal Article S Afr Med J · August 2006 Link to item Cite

An efficient approximation to lookahead in relational learners

Conference Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2006 Greedy machine learning algorithms suffer from shortsightedness, potentially returning suboptimal models due to limited exploration of the search space. Greedy search misses useful refinements that yield a significant gain only in conjunction with other co ... Full text Cite

Experimental design of time series data for learning from dynamic Bayesian networks.

Conference Pac Symp Biocomput · 2006 Bayesian networks (BNs) and dynamic Bayesian networks (DBNs) are becoming more widely used as a way to learn various types of networks, including cellular signaling networks, from high-throughput data. Due to the high cost of performing experiments, we are ... Link to item Cite

View learning for statistical relational learning: With an application to mammography

Conference IJCAI International Joint Conference on Artificial Intelligence · December 1, 2005 Statistical relational learning (SRL) constructs probabilistic models from relational databases. A key capability of SRL is the learning of arcs (in the Bayes net sense) connecting entries in different rows of a relational table, or in different tables. Ne ... Cite

Predicting cancer susceptibility from single-nucleotide polymorphism data: A case study in multiple myeloma

Conference Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining · December 1, 2005 This paper asks whether susceptibility to early-onset (diagnosis before age '10) of a particularly deadly form of cancer, Multiple Myeloma, can be predicted from single-nucleotide polymorphism (SNP) profiles with an accuracy greater than chance. Specifical ... Full text Cite

An integrated approach to learning Bayesian networks of rules

Conference Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · December 1, 2005 Inductive Logic Programming (ILP) is a popular approach for learning rules for classification tasks. An important question is how to combine the individual rules to obtain a useful classifier. In some instances, converting each learned rule into a binary f ... Full text Cite

Why skewing works: Learning difficult boolean functions with greedy tree learners

Conference ICML 2005 - Proceedings of the 22nd International Conference on Machine Learning · December 1, 2005 We analyze skewing, an approach that has been empirically observed to enable greedy decision tree learners to learn "difficult" Boolean functions, such as parity, in the presence of irrelevant variables. We prove that, in an idealized setting, for any func ... Cite

Mode directed path finding

Conference Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · December 1, 2005 Learning from multi-relational domains has gained increasing attention over the past few years. Inductive logic programming (ILP) systems, which often rely on hill-climbing heuristics in learning first-order concepts, have been a dominating force in the ar ... Full text Cite

Knowledge discovery from structured mammography reports using inductive logic programming.

Journal Article AMIA Annu Symp Proc · 2005 The development of large mammography databases provides an opportunity for knowledge discovery and data mining techniques to recognize patterns not previously appreciated. Using a database from a breast imaging practice containing patient risk factors, ima ... Link to item Cite

A framework for set-oriented computation in inductive logic programming and its application in generalizing inverse entailment

Conference Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) · January 1, 2005 We propose a new approach to Inductive Logic Programming i that systematically exploits caching and offers a number of advantages over current systems. It avoids redundant computation, is more amenable to the use of set-oriented generation and evaluation o ... Full text Cite

Generalized skewing for functions with continuous and nominal attributes

Conference ICML 2005 - Proceedings of the 22nd International Conference on Machine Learning · January 1, 2005 This paper extends previous work on skewing, an approach to problematic functions in decision tree induction. The previous algorithms were applicable only to functions of binary variables. In this paper, we extend skewing to directly handle functions of co ... Full text Cite

Toward automatic management of embarrassingly parallel applications

Journal Article Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · December 1, 2004 Large-scale applications that require executing very large numbers of tasks are only feasible through parallelism. In this work we present a system that automatically handles large numbers of experiments and data in the context of machine learning. Our sys ... Cite

Sequential skewing: An improved skewing algorithm

Conference Proceedings, Twenty-First International Conference on Machine Learning, ICML 2004 · December 1, 2004 This paper extends previous work on the Skewing algorithm, a promising approach that allows greedy decision tree induction algorithms to handle problematic functions such as parity functions with a lower run-time penalty than Lookahead. A deficiency of the ... Cite

ILP: A short look back and a longer look forward

Conference Journal of Machine Learning Research · May 15, 2004 Inductive logic programming (ILP) is built on a foundation laid by research in machine learning and computational logic. Armed with this strong foundation, ILP has been applied to important and interesting problems in the life sciences, engineering and the ... Full text Cite

Using Machine Learning to Design and Interpret Gene-Expression Microarrays

Journal Article AI Magazine · March 1, 2004 Gene-expression microarrays, commonly called gene chips, make It possible to simultaneously measure the rate at which a cell or tissue is expressing - translating into a protein - each of its thousands of genes. One can use these comprehensive snapshots of ... Cite

A Monte Carlo study of randomised restarted search in ILP

Conference Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) · January 1, 2004 Full text Cite

Evaluation of multiple models to distinguish closely related forms of disease using DNA microarray data: an application to multiple myeloma.

Journal Article Stat Appl Genet Mol Biol · 2004 MOTIVATION: Standard laboratory classification of the plasma cell dyscrasia monoclonal gammopathy of undetermined significance (MGUS) and the overt plasma cell neoplasm multiple myeloma (MM) is quite accurate, yet, for the most part, biologically uninforma ... Full text Link to item Cite

Skewing: An efficient alternative to lookahead for decision tree induction

Conference IJCAI International Joint Conference on Artificial Intelligence · December 1, 2003 This paper presents a novel, promising approach that allows greedy decision tree induction algorithms to handle problematic functions such as parity functions. Lookahead is the standard approach to addressing difficult functions for greedy decision tree le ... Cite

A Bayesian network approach to operon prediction.

Journal Article Bioinformatics · July 1, 2003 MOTIVATION: In order to understand transcription regulation in a given prokaryotic genome, it is critical to identify operons, the fundamental units of transcription, in such species. While there are a growing number of organisms whose sequence and gene co ... Full text Link to item Cite

Biological applications of multi-relational data mining

Journal Article ACM SIGKDD Explorations Newsletter · July 2003 Biological databases contain a wide variety of data types, often with rich relational structure. Consequently multi-relational data mining techniques frequently are applied to biological data. This paper presents several applications of multi-relat ... Full text Cite

Lattice-search runtime distributions may be heavy-tailed

Conference Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2003 Recent empirical studies show that runtime distributions of backtrack procedures for solving hard combinatorial problems often have intriguing properties. Unlike standard distributions (such as the normal), such distributions decay slower than exponentiall ... Full text Cite

An empirical evaluation of bagging in inductive logic programming

Conference Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2003 Ensembles have proven useful for a variety of applications, with a variety of machine learning approaches. While Quinlan has applied boosting to FOIL, the widely-used approach of bagging has never been employed in ILP. Bagging has the advantage over boosti ... Full text Cite

Accelerating the drug design process through parallel inductive logic programming data mining

Conference Proceedings of the 2003 IEEE Bioinformatics Conference, CSB 2003 · January 1, 2003 This paper presents a new system for parallel inductive logic search for pharmacophores which can potentially accelerate the chemical evaluation phase of the drug design process. This system has been tested on a Beowulf cluster and an IBM SP2 supercomputer ... Full text Cite

The role of declarative languages in mining biological databases

Conference Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2002 Biological and biomedical databases have become a primary application area for data mining. Such databases commonly involve multiple relational tables and a variety of data types, as in the biological databases that formed the basis for the KDD Cup 2001 an ... Cite

Modelling regulatory pathways in E. coli from time series expression profiles.

Conference Bioinformatics · 2002 MOTIVATION: Cells continuously reprogram their gene expression network as they move through the cell cycle or sense changes in their environment. In order to understand the regulation of cells, time series expression profiles provide a more complete pictur ... Full text Link to item Cite

KDD Cup 2001 report

Journal Article ACM SIGKDD Explorations Newsletter · January 2002 This paper presents results and lessons from KDD Cup 2001. KDD Cup 2001 focused on mining biological databases. It involved three cutting-edge tasks related to drug design and genomics. ... Full text Cite

Advancing drug discovery through parallel inductive logic programming

Conference PARALLEL AND DISTRIBUTED COMPUTING SYSTEMS · January 1, 2002 Link to item Cite

Special issue on inductive logic programming

Journal Article MACHINE LEARNING · April 1, 2001 Link to item Cite

Guest editorial

Journal Article Machine Learning · April 1, 2001 Full text Cite

Multiple Instance Regression.

Conference ICML · 2001 Cite

An Approach to Parallel Data Mining for Pharmacophore Discovery

Conference 10th Golden West International Conference on Intelligent Systems 2001, ICIS 2001 · January 1, 2001 Rapid and efficient design of new drugs is a key challenge for the medical industry. Current drug design methodologies can take several years just in the initial chemical evaluation stages before compounds can be created for animal and human testing. This ... Cite

Ilp: Just do it

Conference Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2000 Inductive logic programming (ILP) is built on a foundation laid by research in other areas of computational logic. But in spite of this strong foundation, at 10 years of age ILP now faces a number of new challenges brought on by exciting application opport ... Full text Cite

ILP: Just do it

Conference Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) · January 1, 2000 Inductive logic programming (ILP) is built on a foundation laid by research in other areas of computational logic. But in spite of this strong foundation, at 10 years of age ILP now faces a number of new challenges brought on by exciting application opport ... Full text Cite

A probabilistic learning approach to whole-genome operon prediction.

Journal Article Proc Int Conf Intell Syst Mol Biol · 2000 We present a computational approach to predicting operons in the genomes of prokaryotic organisms. Our approach uses machine learning methods to induce predictive models for this task from a rich variety of data types including sequence data, gene expressi ... Link to item Cite

Is it better to combine predictions?

Journal Article Protein Eng · January 2000 We have compared the accuracy of the individual protein secondary structure prediction methods: PHD, DSC, NNSSP and Predator against the accuracy obtained by combing the predictions of the methods. A range of ways of combing predictions were tested: voting ... Full text Link to item Cite

Parallel data mining for pharmacophore discovery

Conference Proceedings of the IEEE International Conference on Systems, Man and Cybernetics · January 1, 2000 Rapid and effective design of new drugs to combat new strains of antibiotic resistant organisms, more effectively treat chronic conditions, and provide other life sustaining treatment is a key challenge for the medical industry. Current drug design methodo ... Full text Cite

A Probabilistic Learning Approach to Whole-Genome Operon Prediction

Conference Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology, ISMB 2000 · January 1, 2000 We present a computational approach to predicting operons in the genomes of prokaryotic organisms. Our approach uses machine learning methods to induce predictive models for this task from a rich variety of data types including sequence data, gene expressi ... Cite

Guest editors' introduction

Journal Article The Journal of Logic Programming · August 1999 Full text Cite

Preface

Book · January 1, 1998 Cite

Pharmacophore discovery using the Inductive Logic Programming system PROGOL

Journal Article Machine Learning · January 1, 1998 This paper presents a case study of a machine-aided knowledge discovery process within the general area of drug design. Within drug design, the particular problem of pharmacophore discovery is isolated, and the Inductive Logic Programming (ILP) system PROG ... Full text Cite

An initial experiment into stereochemistry-based drug design using inductive logic programming

Conference Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 1997 Previous applications of Inductive Logic Programming to drug design have not addressed stereochemistry, or the three-dimensional aspects of molecules. While some success is possible without consideration of stereochemistry, researchers within the pharmaceu ... Full text Cite

Guest Editors' Introduction.

Journal Article Mach. Learn. · 1997 Full text Cite

Polynomial learnability and Inductive Logic Programming: Methods and results

Journal Article New Generation Computing · December 1995 Full text Cite

Prefix grammars: an alternative characterization of the regular languages

Journal Article Information Processing Letters · July 1994 Full text Cite

LEARNABILITY IN INDUCTIVE LOGIC PROGRAMMING - SOME BASIC RESULTS AND TECHNIQUES

Conference PROCEEDINGS OF THE ELEVENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE · January 1, 1993 Link to item Cite

Generalization and learnability: A study of constrained atoms

Chapter · January 1, 1992 Inductive logic programming is a new research area formed at the intersection of machine learning and logic programming. ... Cite

LEARNING CONSTRAINED ATOMS

Conference MACHINE LEARNING · January 1, 1991 Link to item Cite

GENERALIZING ATOMS IN CONSTRAINT LOGIC

Conference PRINCIPLES OF KNOWLEDGE REPRESENTATION AND REASONING · January 1, 1991 Link to item Cite

GENERALIZATION WITH TAXONOMIC INFORMATION

Conference PROCEEDINGS : EIGHTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOLS 1 AND 2 · January 1, 1990 Link to item Cite