Amplifying limited expert input to sanitize large network traces
We present a methodology for identifying sensitive data in packet payloads, motivated by the need to sanitize packets before releasing them (e.g., for network security/dependability analysis). Our methodology accommodates packets recorded from an incompletely documented protocol, in which case it will be necessary to consult a human expert to determine what packet data is sensitive. Since expert availability for such tasks is limited, however, our methodology adopts a hierarchical approach in which most packet inspection is done by less-trained workers whose designations of sensitive data in selected packets best match the expert's. At the core of our methodology is a data reduction and presentation algorithm that selects candidate workers based on their evaluations of a small number of packets; that solicits these workers' designations of sensitive data in a larger (but still minuscule) subset of packets; and then applies these designations to mark sensitive data in the entire data set. We detail our algorithms and evaluate them in a realistic user study. © 2011 IEEE.