Scholars@Duke publication: On challenges in evaluating malware clustering

On challenges in evaluating malware clustering

Publication , Conference

Li, P; Liu, L; Gao, D; Reiter, MK

Published in: Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics

January 1, 2010

Published version (DOI)

Malware clustering and classification are important tools that enable analysts to prioritize their malware analysis efforts. The recent emergence of fully automated methods for malware clustering and classification that report high accuracy suggests that this problem may largely be solved. In this paper, we report the results of our attempt to confirm our conjecture that the method of selecting ground-truth data in prior evaluations biases their results toward high accuracy. To examine this conjecture, we apply clustering algorithms from a different domain (plagiarism detection), first to the dataset used in a prior work's evaluation and then to a wholly new malware dataset, to see if clustering algorithms developed without attention to subtleties of malware obfuscation are nevertheless successful. While these studies provide conflicting signals as to the correctness of our conjecture, our investigation of possible reasons uncovers, we believe, a cautionary note regarding the significance of highly accurate clustering results, as can be impacted by testing on a dataset with a biased cluster-size distribution. © 2010 Springer-Verlag.

Duke Scholars

Author Michael Reiter Computer Science

Published In

Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics

DOI

10.1007/978-3-642-15512-3_13

EISSN

1611-3349

ISSN

0302-9743

Publication Date

January 1, 2010

Volume

6307 LNCS

Start / End Page

238 / 255

Related Subject Headings

Artificial Intelligence & Image Processing
46 Information and computing sciences

Citation

APA

Chicago

ICMJE

MLA

NLM

Li, P., Liu, L., Gao, D., & Reiter, M. K. (2010). On challenges in evaluating malware clustering. In Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics (Vol. 6307 LNCS, pp. 238–255). https://doi.org/10.1007/978-3-642-15512-3_13

Li, P., L. Liu, D. Gao, and M. K. Reiter. “On challenges in evaluating malware clustering.” In Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 6307 LNCS:238–55, 2010. https://doi.org/10.1007/978-3-642-15512-3_13.

Li P, Liu L, Gao D, Reiter MK. On challenges in evaluating malware clustering. In: Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics. 2010. p. 238–55.

Li, P., et al. “On challenges in evaluating malware clustering.” Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, vol. 6307 LNCS, 2010, pp. 238–55. Scopus, doi:10.1007/978-3-642-15512-3_13.

Li P, Liu L, Gao D, Reiter MK. On challenges in evaluating malware clustering. Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics. 2010. p. 238–255.

Published In

Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics

DOI

10.1007/978-3-642-15512-3_13

EISSN

1611-3349

ISSN

0302-9743

Publication Date

January 1, 2010

Volume

6307 LNCS

Start / End Page

238 / 255

Related Subject Headings

Artificial Intelligence & Image Processing
46 Information and computing sciences