Three attitudes towards data mining
'Data mining' refers to a broad class of activities that have in common, a search over different ways to process or package data statistically or econometrically with the purpose of making the final presentation meet certain design criteria. We characterize three attitudes toward data mining: first, that it is to be avoided and, if it is engaged in, that statistical inferences must be adjusted to account for it; second, that it is inevitable and that the only results of any interest are those that transcend the variety of alternative data mined specifications (a view associated with Leamer's extreme-bounds analysis); and third, that it is essential and that the only hope we have of using econometrics to uncover true economic relationships is to be found in the intelligent mining of data. The first approach confuses considerations of sampling distribution and considerations of epistemic warrant and, reaches an unnecessarily hostile attitude toward data mining. The second approach relies on a notion of robustness that has little relationship to truth: there is no good reason to expect a true specification to be robust alternative specifications. Robustness is not, in general, a carrier of epistemic warrant. The third approach is operationalized in the general-to-specific search methodology of the LSE school of econometrics. Its success demonstrates that intelligent data mining is an important element in empirical investigation in economics. © 2000, Taylor & Francis Group, LLC.
Volume / Issue
Start / End Page
Electronic International Standard Serial Number (EISSN)
International Standard Serial Number (ISSN)
Digital Object Identifier (DOI)