Random forests for genetic association studies.


Journal Article (Review)

The Random Forests (RF) algorithm has become a commonly used machine learning algorithm for genetic association studies. It is well suited for genetic applications since it is both computationally efficient and models genetic causal mechanisms well. With its growing ubiquity, there has been inconsistent and less than optimal use of RF in the literature. The purpose of this review is to breakdown the theoretical and statistical basis of RF so that practitioners are able to apply it in their work. An emphasis is placed on showing how the various components contribute to bias and variance, as well as discussing variable importance measures. Applications specific to genetic studies are highlighted. To provide context, RF is compared to other commonly used machine learning algorithms.

Full Text

Duke Authors

Cited Authors

  • Goldstein, BA; Polley, EC; Briggs, FBS

Published Date

  • 2011

Published In

Volume / Issue

  • 10 / 1

Start / End Page

  • 32 -

PubMed ID

  • 22889876

Pubmed Central ID

  • 22889876

Electronic International Standard Serial Number (EISSN)

  • 1544-6115

Digital Object Identifier (DOI)

  • 10.2202/1544-6115.1691


  • eng

Conference Location

  • Germany