A review and analysis of research on the test-retest reliability of professional judgment


Journal Article

This paper analyzes existing research on the test-retest reliability of human judgment, i.e. the extent to which a judge makes identical judgments when presented with identical stimuli on two occasions. Only research involving professional judges who make experimental judgments in a reasonable analog of their everyday experience is included. Studies of both internal consistency reliability and temporal stability reliability are analyzed (where the former refers to the inclusion of repeat stimuli in the same experimental session, and the latter refers to the repeating of the experimental task from a few days to several months later). It is found that (1) the test-retest reliability literature is concentrated in four substantive judgment areas (medicine/psychology, meteorology, human resources management, and business), (2) the literature is extremely variable in terms of research approach/ design, the determinants or correlates of test-retest reliability that have been studied, and the quality of the execution and analysis, and (3) mean test-retest reliability differs across both substantive judgment areas and the internal consistency versus temporal stability distinction. An inescapable conclusion from the analysis is that our knowledge of this fundamental property of human judgment is quite meager. Therefore, the paper concludes with suggestions about future research that would address test-retest reliability more systematically. Copyright © 2000 John Wiley & Sons, Ltd.

Full Text

Duke Authors

Cited Authors

  • Ashton, RH

Published Date

  • January 1, 2000

Published In

Volume / Issue

  • 13 / 3

Start / End Page

  • 277 - 294

International Standard Serial Number (ISSN)

  • 0894-3257

Digital Object Identifier (DOI)

  • 10.1002/1099-0771(200007/09)13:3<277::AID-BDM350>3.0.CO;2-B

Citation Source

  • Scopus