Prediction and functional characterization of transcriptional activation domains

Conference Paper

Gene expression is induced by transcription factors (TFs) through their activation domains (ADs). However, ADs are unconserved, intrinsically disordered sequences without a secondary structure, making it challenging to recognize and predict these regions and limiting our ability to identify TFs. Here, we address this challenge by leveraging a neural network approach to systematically predict ADs. As input for our neural network, we used computed properties for amino acid (AA) side chain and secondary structure, rather than relying on the raw sequence. Moreover, to shed light on the features learned by our neural network and greatly increase interpretability, we computed the input properties most important for an accurate prediction. Our findings further highlight the importance of aromatic and negatively charged AA and reveal the importance of unknown AA properties. Taking advantage of these most important features, we used an unsupervised learning approach to classify the ADs into 10 subclasses, which can further be explored for AA specificity and AD functionality. Overall, our pipeline, relying on supervised and unsupervised machine learning, shed light on the non-linear properties of ADs.

Full Text

Duke Authors

Cited Authors

  • Mahatma, S; Van Den Broeck, L; Morffy, N; Staller, MV; Strader, LC; Sozzani, R

Published Date

  • January 1, 2023

Published In

  • 2023 57th Annual Conference on Information Sciences and Systems, Ciss 2023

International Standard Book Number 13 (ISBN-13)

  • 9781665451819

Digital Object Identifier (DOI)

  • 10.1109/CISS56502.2023.10089768

Citation Source

  • Scopus