Data base error trapping and prediction
We develop and analyze models for a class of problems involving inferences about uncertain numbers of errors in data bases. In particular, we study two error detection methods. In the duplicate performance method, all items in a data base are processed by two individuals (or machines), and the resulting records are compared to find disagreements, which are then resolved. In the known errors method, a data base is first extended to include additional items known to be in error, and then the extended data base is checked by a single individual. For both methods, we lay out the underlying structure of the model and generate inferences in terms of predictive distributions for the numbers of undetected errors. The role of prior information is important in these problems of data base quality management. In the first method of error checking, for example, observed data are always equally consistent with small error rates and few remaining errors and with high error rates and many remaining errors. Most of our illustrative analyses use fairly conservative prior specifications, and the results are compared with those in the less formal development of Strayhorn. In practice, of course, appropriately realistic priors should be used, and some possibilities are mentioned. Models of the type studied here are applicable in a wide variety of important practical problems in data quality management, with examples in industrial quality control and reliability control being of particular note. © 1991 Taylor & Francis Group, LLC.
Volume / Issue
Start / End Page
Electronic International Standard Serial Number (EISSN)
International Standard Serial Number (ISSN)
Digital Object Identifier (DOI)