Evidence of influence of genomic DNA sequence on human X chromosome inactivation.
A significant number of human X-linked genes escape X chromosome inactivation and are thus expressed from both the active and inactive X chromosomes. The basis for escape from inactivation and the potential role of the X chromosome primary DNA sequence in determining a gene's X inactivation status is unclear. Using a combination of the X chromosome sequence and a comprehensive X inactivation profile of more than 600 genes, two independent yet complementary approaches were used to systematically investigate the relationship between X inactivation and DNA sequence features. First, statistical analyses revealed that a number of repeat features, including long interspersed nuclear element (LINE) and mammalian-wide interspersed repeat repetitive elements, are significantly enriched in regions surrounding transcription start sites of genes that are subject to inactivation, while Alu repetitive elements and short motifs containing ACG/CGT are significantly enriched in those that escape inactivation. Second, linear support vector machine classifiers constructed using primary DNA sequence features were used to correctly predict the X inactivation status for >80% of all X-linked genes. We further identified a small set of features that are important for accurate classification, among which LINE-1 and LINE-2 content show the greatest individual discriminatory power. Finally, as few as 12 features can be used for accurate support vector machine classification. Taken together, these results suggest that features of the underlying primary DNA sequence of the human X chromosome may influence the spreading and/or maintenance of X inactivation.
Wang, Z; Willard, HF; Mukherjee, S; Furey, TS
Volume / Issue
Start / End Page
Pubmed Central ID
Electronic International Standard Serial Number (EISSN)
International Standard Serial Number (ISSN)
Digital Object Identifier (DOI)