Skip to main content

William Majoros

Assistant Professor of Biostatistics & Bioinformatics
Biostatistics & Bioinformatics, Division of Integrative Genomics
Duke Box 103854, Durham, NC 27710
101 Science Drive, 2187 CIEMAS, Research Drive, Durham, NC 27708

Selected Publications


Characterization and bioinformatic filtering of ambient gRNAs in single-cell CRISPR screens using CLEANSER.

Journal Article Cell Genom · February 12, 2025 Single-cell RNA sequencing CRISPR (perturb-seq) screens enable high-throughput investigation of the genome, allowing for characterization of thousands of genomic perturbations on gene expression. Ambient gRNAs, which are contaminating gRNAs, are a major so ... Full text Link to item Cite

Bayesian Estimation of Allele-Specific Expression in the Presence of Phasing Uncertainty.

Journal Article bioRxiv · August 13, 2024 MOTIVATION: Allele-specific expression (ASE) analyses aim to detect imbalanced expression of maternal versus paternal copies of an autosomal gene. Such allelic imbalance can result from a variety of cis-acting causes, including disruptive mutations within ... Full text Link to item Cite

Full-length dystrophin restoration via targeted exon integration by AAV-CRISPR in a humanized mouse model of Duchenne muscular dystrophy.

Journal Article Molecular therapy : the journal of the American Society of Gene Therapy · November 2021 Targeted gene-editing strategies have emerged as promising therapeutic approaches for the permanent treatment of inherited genetic diseases. However, precise gene correction and insertion approaches using homology-directed repair are still limited by low e ... Full text Cite

Bayesian estimation of genetic regulatory effects in high-throughput reporter assays.

Journal Article Bioinformatics · January 15, 2020 MOTIVATION: High-throughput reporter assays dramatically improve our ability to assign function to noncoding genetic variants, by measuring allelic effects on gene expression in the controlled setting of a reporter gene. Unlike genetic association tests, s ... Full text Link to item Cite

Evaluating Chromatin Accessibility Differences Across Multiple Primate Species Using a Joint Modeling Approach.

Journal Article Genome biology and evolution · October 2019 Changes in transcriptional regulation are thought to be a major contributor to the evolution of phenotypic traits, but the contribution of changes in chromatin accessibility to the evolution of gene expression remains almost entirely unknown. To address th ... Full text Cite

Human genome-wide measurement of drug-responsive regulatory activity.

Journal Article Nat Commun · December 21, 2018 Environmental stimuli commonly act via changes in gene regulation. Human-genome-scale assays to measure such responses are indirect or require knowledge of the transcription factors (TFs) involved. Here, we present the use of human genome-wide high-through ... Full text Link to item Cite

Predicting gene structure changes resulting from genetic variants via exon definition features.

Journal Article Bioinformatics · November 1, 2018 MOTIVATION: Genetic variation that disrupts gene function by altering gene splicing between individuals can substantially influence traits and disease. In those cases, accurately predicting the effects of genetic variation on splicing can be highly valuabl ... Full text Link to item Cite

Glucocorticoid receptor recruits to enhancers and drives activation by motif-directed binding.

Journal Article Genome Res · September 2018 Glucocorticoids are potent steroid hormones that regulate immunity and metabolism by activating the transcription factor (TF) activity of glucocorticoid receptor (GR). Previous models have proposed that DNA binding motifs and sites of chromatin accessibili ... Full text Link to item Cite

Pre-established Chromatin Interactions Mediate the Genomic Response to Glucocorticoids.

Journal Article Cell Syst · August 22, 2018 The glucocorticoid receptor (GR) is a hormone-inducible transcription factor involved in metabolic and anti-inflammatory gene expression responses. To investigate what controls interactions between GR binding sites and their target genes, we used in situ H ... Full text Link to item Cite

High-throughput interpretation of gene structure changes in human and nonhuman resequencing data, using ACE.

Journal Article Bioinformatics · May 15, 2017 MOTIVATION: The accurate interpretation of genetic variants is critical for characterizing genotype-phenotype associations. Because the effects of genetic variants can depend strongly on their local genomic context, accurate genome annotations are essentia ... Full text Link to item Cite

Orion: Detecting regions of the human non-coding genome that are intolerant to variation using population genetics.

Journal Article PLoS One · 2017 There is broad agreement that genetic mutations occurring outside of the protein-coding regions play a key role in human disease. Despite this consensus, we are not yet capable of discerning which portions of non-coding sequence are important in the contex ... Full text Link to item Cite

Direct GR Binding Sites Potentiate Clusters of TF Binding across the Human Genome.

Journal Article Cell · August 25, 2016 The glucocorticoid receptor (GR) binds the human genome at >10,000 sites but only regulates the expression of hundreds of genes. To determine the functional effect of each site, we measured the glucocorticoid (GC) responsive activity of nearly all GR bindi ... Full text Link to item Cite

Efficient Genome-Wide Sequencing and Low-Coverage Pedigree Analysis from Noninvasively Collected Samples.

Journal Article Genetics · June 2016 Research on the genetics of natural populations was revolutionized in the 1990s by methods for genotyping noninvasively collected samples. However, these methods have remained largely unchanged for the past 20 years and lag far behind the genomics era. To ... Full text Link to item Cite

Massively parallel quantification of the regulatory effects of noncoding genetic variation in a human cohort.

Journal Article Genome Res · August 2015 We report a novel high-throughput method to empirically quantify individual-specific regulatory element activity at the population scale. The approach combines targeted DNA capture with a high-throughput reporter gene expression assay. As demonstration, we ... Full text Link to item Cite

Correction of dystrophin expression in cells from Duchenne muscular dystrophy patients through genomic excision of exon 51 by zinc finger nucleases

Journal Article Molecular Therapy · March 5, 2015 Duchenne muscular dystrophy (DMD) is caused by genetic mutations that result in the absence of dystrophin protein expression. Oligonucleotide-induced exon skipping can restore the dystrophin reading frame and protein production. However, this requires cont ... Full text Cite

Correction of dystrophin expression in cells from Duchenne muscular dystrophy patients through genomic excision of exon 51 by zinc finger nucleases.

Journal Article Mol Ther · March 2015 Duchenne muscular dystrophy (DMD) is caused by genetic mutations that result in the absence of dystrophin protein expression. Oligonucleotide-induced exon skipping can restore the dystrophin reading frame and protein production. However, this requires cont ... Full text Link to item Cite

Multiplex CRISPR/Cas9-based genome editing for correction of dystrophin mutations that cause Duchenne muscular dystrophy.

Journal Article Nat Commun · February 18, 2015 The CRISPR/Cas9 genome-editing platform is a promising technology to correct the genetic basis of hereditary diseases. The versatility, efficiency and multiplexing capabilities of the CRISPR/Cas9 system enable a variety of otherwise challenging gene correc ... Full text Link to item Cite

Improved transcript isoform discovery using ORF graphs.

Journal Article Bioinformatics · July 15, 2014 MOTIVATION: High-throughput sequencing of RNA in vivo facilitates many applications, not the least of which is the cataloging of variant splice isoforms of protein-coding messenger RNAs. Although many solutions have been proposed for reconstructing putativ ... Full text Link to item Cite

MicroRNA target site identification by integrating sequence and binding information.

Journal Article Nat Methods · July 2013 High-throughput sequencing has opened numerous possibilities for the identification of regulatory RNA-binding events. Cross-linking and immunoprecipitation of Argonaute proteins can pinpoint a microRNA (miRNA) target site within tens of bases but leaves th ... Full text Link to item Cite

Automated annotation of gene expression image sequences via non-parametric factor analysis and conditional random fields.

Journal Article Bioinformatics · July 1, 2013 MOTIVATION: Computational approaches for the annotation of phenotypes from image data have shown promising results across many applications, and provide rich and valuable information for studying gene function and interactions. While data are often availab ... Full text Link to item Cite

MicroRNA target site identification by integrating sequence and binding information

Journal Article Nature Methods · May 26, 2013 High-throughput sequencing has opened numerous possibilities for the identification of regulatory RNA-binding events. Cross-linking and immunoprecipitation of Argonaute proteins can pinpoint a microRNA (miRNA) target site within tens of bases but leaves th ... Full text Cite

Translocation of sickle cell erythrocyte microRNAs into Plasmodium falciparum inhibits parasite translation and contributes to malaria resistance.

Journal Article Cell Host Microbe · August 16, 2012 Erythrocytes carrying a variant hemoglobin allele (HbS), which causes sickle cell disease and resists infection by the malaria parasite Plasmodium falciparum. The molecular basis of this resistance, which has long been recognized as multifactorial, remains ... Full text Link to item Cite

Modeling the evolution of regulatory elements by simultaneous detection and alignment with phylogenetic pair HMMs.

Journal Article PLoS Comput Biol · December 16, 2010 The computational detection of regulatory elements in DNA is a difficult but important problem impacting our progress in understanding the complex nature of eukaryotic gene regulation. Attempts to utilize cross-species conservation for this task have been ... Full text Open Access Link to item Cite

Complexity reduction in context-dependent DNA substitution models.

Journal Article Bioinformatics · January 15, 2009 MOTIVATION: The modeling of conservation patterns in genomic DNA has become increasingly popular for a number of bioinformatic applications. While several systems developed to date incorporate context-dependence in their substitution models, the impact on ... Full text Link to item Cite

Motif composition, conservation and condition-specificity of single and alternative transcription start sites in the Drosophila genome.

Journal Article Genome Biol · 2009 BACKGROUND: Transcription initiation is a key component in the regulation of gene expression. mRNA 5' full-length sequencing techniques have enhanced our understanding of mammalian transcription start sites (TSSs), revealing different initiation patterns o ... Full text Link to item Cite

A viral microRNA functions as an orthologue of cellular miR-155.

Journal Article Nature · December 13, 2007 All metazoan eukaryotes express microRNAs (miRNAs), roughly 22-nucleotide regulatory RNAs that can repress the expression of messenger RNAs bearing complementary sequences. Several DNA viruses also express miRNAs in infected cells, suggesting a role in vir ... Full text Link to item Cite

Gene prediction methods

Chapter · December 1, 2007 Most computational gene-finding methods in current use are derived from the fields of natural language processing and speech recognition. These latter fields are concerned with parsing spoken or written language into functional components such as nouns, ve ... Full text Cite

Spatial preferences of microRNA targets in 3' untranslated regions.

Journal Article BMC Genomics · June 7, 2007 BACKGROUND: MicroRNAs are an important class of regulatory RNAs which repress animal genes by preferentially interacting with complementary sequence motifs in the 3' untranslated region (UTR) of target mRNAs. Computational methods have been developed which ... Full text Link to item Cite

Methods for computational gene prediction

Book · 2007 A self-contained, rigorous text describing models used to identify genes in genomic DNA sequences. ... Cite

Macronuclear genome sequence of the ciliate Tetrahymena thermophila, a model eukaryote.

Journal Article PLoS Biol · September 2006 The ciliate Tetrahymena thermophila is a model organism for molecular and cellular biology. Like other ciliates, this species has separate germline and soma functions that are embodied by distinct nuclei within a single cell. The germline-like micronucleus ... Full text Link to item Cite

TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders

Journal Article Bioinformatics · November 1, 2004 AbstractSummary: We describe two new Generalized Hidden Markov Model implementations for ab initio eukaryotic gene prediction. The C/C++ source code for both is available as open source and is highly reusabl ... Full text Cite

The ENCODE (ENCyclopedia Of DNA Elements) Project.

Journal Article Science · October 22, 2004 The ENCyclopedia Of DNA Elements (ENCODE) Project aims to identify all functional elements in the human genome sequence. The pilot phase of the Project is focused on a specified 30 megabases (approximately 1%) of the human genome sequence and is organized ... Full text Link to item Cite

Assessment of Genome-Wide Protein Function Classification for Drosophila melanogaster

Journal Article Genome Research · September 2003 The functional classification of genes on a genome-wide scale is now in its infancy, and we make a first attempt to assess existing methods and identify sources of error. To this end, we compared two independent efforts for associating proteins ... Full text Cite

GlimmerM, Exonomy and Unveil: three ab initio eukaryotic genefinders

Journal Article Nucleic Acids Research · July 1, 2003 Full text Cite

Identification of key concepts in biomedical literature using a modified Markov heuristic

Journal Article Bioinformatics · February 12, 2003 AbstractMotivation: The recent explosion of interest in mining the biomedical literature for associations between defined entities such as genes, diseases and drugs has made apparent the need for robust meth ... Full text Cite

A preliminary comparison of the mouse and human genomes

Journal Article International Congress Series · December 2002 Full text Cite

Genomics and natural language processing

Journal Article Nature Reviews Genetics · August 2002 Full text Cite

A comparison of whole-genome shotgun-derived mouse chromosome 16 and the human genome.

Journal Article Science · May 31, 2002 The high degree of similarity between the mouse and human genomes is demonstrated through analysis of the sequence of mouse chromosome 16 (Mmu 16), which was obtained as part of a whole-genome shotgun assembly of the mouse genome. The mouse genome is about ... Full text Link to item Cite

The sequence of the human genome

Journal Article Science · 2001 Cite