Skip to main content
Journal cover image

A comparison of feature selection methodologies and learning algorithms in the development of a DNA methylation-based telomere length estimator.

Publication ,  Journal Article
Doherty, T; Dempster, E; Hannon, E; Mill, J; Poulton, R; Corcoran, D; Sugden, K; Williams, B; Caspi, A; Moffitt, TE; Delany, SJ; Murphy, TM
Published in: BMC bioinformatics
May 2023

The field of epigenomics holds great promise in understanding and treating disease with advances in machine learning (ML) and artificial intelligence being vitally important in this pursuit. Increasingly, research now utilises DNA methylation measures at cytosine-guanine dinucleotides (CpG) to detect disease and estimate biological traits such as aging. Given the challenge of high dimensionality of DNA methylation data, feature-selection techniques are commonly employed to reduce dimensionality and identify the most important subset of features. In this study, our aim was to test and compare a range of feature-selection methods and ML algorithms in the development of a novel DNA methylation-based telomere length (TL) estimator. We utilised both nested cross-validation and two independent test sets for the comparisons.We found that principal component analysis in advance of elastic net regression led to the overall best performing estimator when evaluated using a nested cross-validation analysis and two independent test cohorts. This approach achieved a correlation between estimated and actual TL of 0.295 (83.4% CI [0.201, 0.384]) on the EXTEND test data set. Contrastingly, the baseline model of elastic net regression with no prior feature reduction stage performed less well in general-suggesting a prior feature-selection stage may have important utility. A previously developed TL estimator, DNAmTL, achieved a correlation of 0.216 (83.4% CI [0.118, 0.310]) on the EXTEND data. Additionally, we observed that different DNA methylation-based TL estimators, which have few common CpGs, are associated with many of the same biological entities.The variance in performance across tested approaches shows that estimators are sensitive to data set heterogeneity and the development of an optimal DNA methylation-based estimator should benefit from the robust methodological approach used in this study. Moreover, our methodology which utilises a range of feature-selection approaches and ML algorithms could be applied to other biological markers and disease phenotypes, to examine their relationship with DNA methylation and predictive value.

Duke Scholars

Altmetric Attention Stats
Dimensions Citation Stats

Published In

BMC bioinformatics

DOI

EISSN

1471-2105

ISSN

1471-2105

Publication Date

May 2023

Volume

24

Issue

1

Start / End Page

178

Related Subject Headings

  • Telomere Homeostasis
  • Regression Analysis
  • Machine Learning
  • Humans
  • Epigenomics
  • DNA Methylation
  • Bioinformatics
  • Algorithms
  • 49 Mathematical sciences
  • 46 Information and computing sciences
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Doherty, T., Dempster, E., Hannon, E., Mill, J., Poulton, R., Corcoran, D., … Murphy, T. M. (2023). A comparison of feature selection methodologies and learning algorithms in the development of a DNA methylation-based telomere length estimator. BMC Bioinformatics, 24(1), 178. https://doi.org/10.1186/s12859-023-05282-4
Doherty, Trevor, Emma Dempster, Eilis Hannon, Jonathan Mill, Richie Poulton, David Corcoran, Karen Sugden, et al. “A comparison of feature selection methodologies and learning algorithms in the development of a DNA methylation-based telomere length estimator.BMC Bioinformatics 24, no. 1 (May 2023): 178. https://doi.org/10.1186/s12859-023-05282-4.
Doherty T, Dempster E, Hannon E, Mill J, Poulton R, Corcoran D, et al. A comparison of feature selection methodologies and learning algorithms in the development of a DNA methylation-based telomere length estimator. BMC bioinformatics. 2023 May;24(1):178.
Doherty, Trevor, et al. “A comparison of feature selection methodologies and learning algorithms in the development of a DNA methylation-based telomere length estimator.BMC Bioinformatics, vol. 24, no. 1, May 2023, p. 178. Epmc, doi:10.1186/s12859-023-05282-4.
Doherty T, Dempster E, Hannon E, Mill J, Poulton R, Corcoran D, Sugden K, Williams B, Caspi A, Moffitt TE, Delany SJ, Murphy TM. A comparison of feature selection methodologies and learning algorithms in the development of a DNA methylation-based telomere length estimator. BMC bioinformatics. 2023 May;24(1):178.
Journal cover image

Published In

BMC bioinformatics

DOI

EISSN

1471-2105

ISSN

1471-2105

Publication Date

May 2023

Volume

24

Issue

1

Start / End Page

178

Related Subject Headings

  • Telomere Homeostasis
  • Regression Analysis
  • Machine Learning
  • Humans
  • Epigenomics
  • DNA Methylation
  • Bioinformatics
  • Algorithms
  • 49 Mathematical sciences
  • 46 Information and computing sciences