Skip to main content

Alejandro Ochoa

Assistant Professor of Biostatistics & Bioinformatics
Biostatistics & Bioinformatics, Division of Integrative Genomics
Duke Box 103854, Durham, NC 27710
101 Science Dr, Room 2177C, Fciemas, Durham, NC 27705

Selected Publications


Genetic association models are robust to common population kinship estimation biases.

Journal Article Genetics · May 4, 2023 Common genetic association models for structured populations, including principal component analysis (PCA) and linear mixed-effects models (LMMs), model the correlation structure between individuals using population kinship matrices, also known as genetic ... Full text Link to item Cite

Limitations of principal components in quantitative genetic association models for human studies.

Journal Article eLife · May 2023 Principal Component Analysis (PCA) and the Linear Mixed-effects Model (LMM), sometimes in combination, are the most common genetic association models. Previous PCA-LMM comparisons give mixed results, unclear guidance, and have several limitations, includin ... Full text Cite

Genetic risk variants for childhood nephrotic syndrome and corticosteroid response.

Journal Article Front Pediatr · 2023 INTRODUCTION: The etiology of most cases of nephrotic syndrome (NS) remains unknown, therefore patients are phenotypically categorized based on response to corticosteroid therapy as steroid sensitive NS (SSNS), or steroid resistant NS (SRNS). Genetic risk ... Full text Link to item Cite

Steroid-sensitive nephrotic syndrome candidate gene CLVS1 regulates podocyte oxidative stress and endocytosis.

Journal Article JCI Insight · January 25, 2022 We performed next-generation sequencing in patients with familial steroid-sensitive nephrotic syndrome (SSNS) and identified a homozygous segregating variant (p.H310Y) in the gene encoding clavesin-1 (CLVS1) in a consanguineous family with 3 affected indiv ... Full text Link to item Cite

HLA Loci and Recurrence of Focal Segmental Glomerulosclerosis in Pediatric Kidney Transplantation.

Journal Article Transplant Direct · October 2021 UNLABELLED: Recurrent focal segmental glomerulosclerosis (FSGS) after kidney transplantation accounts for the majority of allograft failures in children with primary FSGS. Although current research focuses on FSGS pathophysiology, a common etiology and mec ... Full text Open Access Link to item Cite

Correcting signal biases and detecting regulatory elements in STARR-seq data.

Journal Article Genome Res · May 2021 High-throughput reporter assays such as self-transcribing active regulatory region sequencing (STARR-seq) have made it possible to measure regulatory element activity across the entire human genome at once. The resulting data, however, present substantial ... Full text Link to item Cite

Estimating FST and kinship for arbitrary population structures.

Journal Article PLoS Genet · January 2021 FST and kinship are key parameters often estimated in modern population genetics studies in order to quantitatively characterize structure and relatedness. Kinship matrices have also become a fundamental quantity used in genome-wide association studies and ... Full text Link to item Cite

New kinship andFSTestimates reveal higher levels of differentiation in the global human population

Journal Article · 2019 Kinship coefficients and F ST , which measure genetic relatedness and the overall population structure, respectively, have important biomedical applications. However, existing estimators are only accurate under restrictive conditions that most natural popu ... Full text Cite

Testing the effectiveness of principal components in adjusting for relatedness in genetic association studies

Journal Article · 2019 Modern genetic association studies require modeling population structure and family relatedness in order to calculate correct statistics. Principal Components Analysis (PCA) is one of the most common approaches for modeling this population structure, but n ... Full text Cite

Domain prediction with probabilistic directional context.

Journal Article Bioinformatics · August 15, 2017 MOTIVATION: Protein domain prediction is one of the most powerful approaches for sequence-based function prediction. Although domain instances are typically predicted independently of each other, newer approaches have demonstrated improved performance by r ... Full text Link to item Cite

Proteome-wide analysis reveals widespread lysine acetylation of major protein complexes in the malaria parasite.

Journal Article Sci Rep · January 27, 2016 Lysine acetylation is a ubiquitous post-translational modification in many organisms including the malaria parasite Plasmodium falciparum, yet the full extent of acetylation across the parasite proteome remains unresolved. Moreover, the functional signific ... Full text Link to item Cite

Domain prediction with probabilistic directional context

Journal Article · 2016 Motivation Protein domain prediction is one of the most powerful approaches for sequence-based function prediction. While domain instances are typically predicted independently of each other, newer approaches have demonstrated improved performance ... Full text Cite

FSTand kinship for arbitrary population structures I: Generalized definitions

Journal Article · 2016 F ST is a fundamental measure of genetic differentiation and population structure, currently defined for subdivided populations. F ST in practice typically assumes independent, non-overlapping subpopulations , which all split simultaneously from their last ... Full text Cite

FSTand kinship for arbitrary population structures II: Method-of-moments estimators

Journal Article · 2016 F ST and kinship are key parameters often estimated in modern population genetics studies in order to quantitatively characterize structure and relatedness. Kinship matrices have also become a fundamental quantity used in genome-wide association studies an ... Full text Cite

Beyond the E-Value: Stratified Statistics for Protein Domain Prediction.

Journal Article PLoS Comput Biol · November 2015 E-values have been the dominant statistic for protein sequence analysis for the past two decades: from identifying statistically significant local sequence alignments to evaluating matches to hidden Markov models describing protein domain families. Here we ... Full text Link to item Cite

Evolution and diversity in human herpes simplex virus genomes.

Journal Article J Virol · January 2014 Herpes simplex virus 1 (HSV-1) causes a chronic, lifelong infection in >60% of adults. Multiple recent vaccine trials have failed, with viral diversity likely contributing to these failures. To understand HSV-1 diversity better, we comprehensively compared ... Full text Link to item Cite

Genetic association models are robust to common population kinship estimation biases.

Journal Article Genetics · May 4, 2023 Common genetic association models for structured populations, including principal component analysis (PCA) and linear mixed-effects models (LMMs), model the correlation structure between individuals using population kinship matrices, also known as genetic ... Full text Link to item Cite

Limitations of principal components in quantitative genetic association models for human studies.

Journal Article eLife · May 2023 Principal Component Analysis (PCA) and the Linear Mixed-effects Model (LMM), sometimes in combination, are the most common genetic association models. Previous PCA-LMM comparisons give mixed results, unclear guidance, and have several limitations, includin ... Full text Cite

Genetic risk variants for childhood nephrotic syndrome and corticosteroid response.

Journal Article Front Pediatr · 2023 INTRODUCTION: The etiology of most cases of nephrotic syndrome (NS) remains unknown, therefore patients are phenotypically categorized based on response to corticosteroid therapy as steroid sensitive NS (SSNS), or steroid resistant NS (SRNS). Genetic risk ... Full text Link to item Cite

Steroid-sensitive nephrotic syndrome candidate gene CLVS1 regulates podocyte oxidative stress and endocytosis.

Journal Article JCI Insight · January 25, 2022 We performed next-generation sequencing in patients with familial steroid-sensitive nephrotic syndrome (SSNS) and identified a homozygous segregating variant (p.H310Y) in the gene encoding clavesin-1 (CLVS1) in a consanguineous family with 3 affected indiv ... Full text Link to item Cite

HLA Loci and Recurrence of Focal Segmental Glomerulosclerosis in Pediatric Kidney Transplantation.

Journal Article Transplant Direct · October 2021 UNLABELLED: Recurrent focal segmental glomerulosclerosis (FSGS) after kidney transplantation accounts for the majority of allograft failures in children with primary FSGS. Although current research focuses on FSGS pathophysiology, a common etiology and mec ... Full text Open Access Link to item Cite

Correcting signal biases and detecting regulatory elements in STARR-seq data.

Journal Article Genome Res · May 2021 High-throughput reporter assays such as self-transcribing active regulatory region sequencing (STARR-seq) have made it possible to measure regulatory element activity across the entire human genome at once. The resulting data, however, present substantial ... Full text Link to item Cite

Estimating FST and kinship for arbitrary population structures.

Journal Article PLoS Genet · January 2021 FST and kinship are key parameters often estimated in modern population genetics studies in order to quantitatively characterize structure and relatedness. Kinship matrices have also become a fundamental quantity used in genome-wide association studies and ... Full text Link to item Cite

New kinship andFSTestimates reveal higher levels of differentiation in the global human population

Journal Article · 2019 Kinship coefficients and F ST , which measure genetic relatedness and the overall population structure, respectively, have important biomedical applications. However, existing estimators are only accurate under restrictive conditions that most natural popu ... Full text Cite

Testing the effectiveness of principal components in adjusting for relatedness in genetic association studies

Journal Article · 2019 Modern genetic association studies require modeling population structure and family relatedness in order to calculate correct statistics. Principal Components Analysis (PCA) is one of the most common approaches for modeling this population structure, but n ... Full text Cite

Domain prediction with probabilistic directional context.

Journal Article Bioinformatics · August 15, 2017 MOTIVATION: Protein domain prediction is one of the most powerful approaches for sequence-based function prediction. Although domain instances are typically predicted independently of each other, newer approaches have demonstrated improved performance by r ... Full text Link to item Cite

Proteome-wide analysis reveals widespread lysine acetylation of major protein complexes in the malaria parasite.

Journal Article Sci Rep · January 27, 2016 Lysine acetylation is a ubiquitous post-translational modification in many organisms including the malaria parasite Plasmodium falciparum, yet the full extent of acetylation across the parasite proteome remains unresolved. Moreover, the functional signific ... Full text Link to item Cite

Domain prediction with probabilistic directional context

Journal Article · 2016 Motivation Protein domain prediction is one of the most powerful approaches for sequence-based function prediction. While domain instances are typically predicted independently of each other, newer approaches have demonstrated improved performance ... Full text Cite

FSTand kinship for arbitrary population structures I: Generalized definitions

Journal Article · 2016 F ST is a fundamental measure of genetic differentiation and population structure, currently defined for subdivided populations. F ST in practice typically assumes independent, non-overlapping subpopulations , which all split simultaneously from their last ... Full text Cite

FSTand kinship for arbitrary population structures II: Method-of-moments estimators

Journal Article · 2016 F ST and kinship are key parameters often estimated in modern population genetics studies in order to quantitatively characterize structure and relatedness. Kinship matrices have also become a fundamental quantity used in genome-wide association studies an ... Full text Cite

Beyond the E-Value: Stratified Statistics for Protein Domain Prediction.

Journal Article PLoS Comput Biol · November 2015 E-values have been the dominant statistic for protein sequence analysis for the past two decades: from identifying statistically significant local sequence alignments to evaluating matches to hidden Markov models describing protein domain families. Here we ... Full text Link to item Cite

Evolution and diversity in human herpes simplex virus genomes.

Journal Article J Virol · January 2014 Herpes simplex virus 1 (HSV-1) causes a chronic, lifelong infection in >60% of adults. Multiple recent vaccine trials have failed, with viral diversity likely contributing to these failures. To understand HSV-1 diversity better, we comprehensively compared ... Full text Link to item Cite

Using context to improve protein domain identification.

Journal Article BMC Bioinformatics · March 31, 2011 BACKGROUND: Identifying domains in protein sequences is an important step in protein structural and functional annotation. Existing domain recognition methods typically evaluate each domain prediction independently of the rest. However, the majority of pro ... Full text Link to item Cite

Computing van der Waals energies in the context of the rotamer approximation.

Journal Article Proteins · September 1, 2007 The rotamer approximation states that protein side-chain conformations can be described well using a finite set of rotational isomers. This approximation is often applied in the context of computational protein design and structure prediction to reduce the ... Full text Link to item Cite