Taming the Devil: Techniques for Evaluating Anonymized Network Data
Anonymization plays a key role in enabling the public release of network datasets, and yet there are few, if any, techniques for evaluating the efficacy of network data anonymization techniques with respect to the privacy they afford. In fact, recent work suggests that many state-of-the-art anonymization techniques may leak more information than first thought. In this paper, we propose techniques for evaluating the anonymity of network data. Specifically, we simulate the behavior of an adversary whose goal is to deanonymize objects, such as hosts or web pages, within the network data. By doing so, we are able to quantify the anonymity of the data using information theoretic metrics, objectively compare the efficacy of anonymization techniques, and examine the impact of selective deanonymization on the anonymity of the data. Moreover, we provide several concrete applications of our approach on real network data in the hope of underscoring its usefulness to data publishers.