Skip to main content

DiMaC: A disguised missing data cleaning tool

Publication ,  Conference
Hua, M; Pei, J
Published in: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
December 1, 2008

In some applications such as filling in a customer information form on the web, some missing values may not be explicitly represented as such, but instead appear as potentially valid data values. Such missing values are known as disguised missing data, which may impair the quality of data analysis severely. The very limited previous studies on cleaning disguised missing data highly rely on domain background knowledge in specific applications and may not work well for the cases where the disguise values are inliers. Recently, we have studied the problem of cleaning disguised missing data systematically, and proposed an effective heuristic approach [2]. In this paper, we present a demonstration of DiMaC, a Disguised Missing Data Cleaning tool which can find the frequently used disguise values in data sets without any domain background knowledge. In this demo, we will show (1) the critical techniques of finding suspicious disguise values; (2) the architecture and user interface of DiMaC system; (3) an empirical case study on both real and synthetic data sets, which verifies the effectiveness and the efficiency of the techniques; and (4) some challenges arising from real applications and several direction for future work.

Duke Scholars

Published In

Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

DOI

Publication Date

December 1, 2008

Start / End Page

1077 / 1080
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Hua, M., & Pei, J. (2008). DiMaC: A disguised missing data cleaning tool. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1077–1080). https://doi.org/10.1145/1401890.1402023
Hua, M., and J. Pei. “DiMaC: A disguised missing data cleaning tool.” In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1077–80, 2008. https://doi.org/10.1145/1401890.1402023.
Hua M, Pei J. DiMaC: A disguised missing data cleaning tool. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008. p. 1077–80.
Hua, M., and J. Pei. “DiMaC: A disguised missing data cleaning tool.” Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2008, pp. 1077–80. Scopus, doi:10.1145/1401890.1402023.
Hua M, Pei J. DiMaC: A disguised missing data cleaning tool. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008. p. 1077–1080.

Published In

Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

DOI

Publication Date

December 1, 2008

Start / End Page

1077 / 1080