Data base error trapping and prediction


Journal Article

We develop and analyze models for a class of problems involving inferences about uncertain numbers of errors in data bases. In particular, we study two error detection methods. In the duplicate performance method, all items in a data base are processed by two individuals (or machines), and the resulting records are compared to find disagreements, which are then resolved. In the known errors method, a data base is first extended to include additional items known to be in error, and then the extended data base is checked by a single individual. For both methods, we lay out the underlying structure of the model and generate inferences in terms of predictive distributions for the numbers of undetected errors. The role of prior information is important in these problems of data base quality management. In the first method of error checking, for example, observed data are always equally consistent with small error rates and few remaining errors and with high error rates and many remaining errors. Most of our illustrative analyses use fairly conservative prior specifications, and the results are compared with those in the less formal development of Strayhorn. In practice, of course, appropriately realistic priors should be used, and some possibilities are mentioned. Models of the type studied here are applicable in a wide variety of important practical problems in data quality management, with examples in industrial quality control and reliability control being of particular note. © 1991 Taylor & Francis Group, LLC.

Full Text

Duke Authors

Cited Authors

  • West, M; Winkler, RL

Published Date

  • January 1, 1991

Published In

Volume / Issue

  • 86 / 416

Start / End Page

  • 987 - 996

Electronic International Standard Serial Number (EISSN)

  • 1537-274X

International Standard Serial Number (ISSN)

  • 0162-1459

Digital Object Identifier (DOI)

  • 10.1080/01621459.1991.10475142

Citation Source

  • Scopus