Calibrate: Frequency Estimation and Heavy Hitter Identification with Local Differential Privacy via Incorporating Prior Knowledge
Estimating frequencies of certain items among a
population is a basic step in data analytics, which enables more
advanced data analytics (e.g., heavy hitter identification, frequent
pattern mining), client software optimization, and detecting
unwanted or malicious hijacking of user settings in browsers.
Frequency estimation and heavy hitter identification with local
differential privacy (LDP) protect user privacy as well as the
data collector. Existing LDP algorithms cannot leverage 1) prior
knowledge about the noise in the estimated item frequencies and
2) prior knowledge about the true item frequencies. As a result,
they achieve suboptimal performance in practice.
In this work, we aim to design LDP algorithms that can
leverage such prior knowledge. Specifically, we design Calibrate
to incorporate the prior knowledge via statistical inference.
Calibrate can be appended to an existing LDP algorithm to
reduce its estimation errors. We model the prior knowledge about
the noise and the true item frequencies as two probability distributions, respectively. Given the two probability distributions and
an estimated frequency of an item produced by an existing LDP
algorithm, our Calibrate computes the conditional probability
distribution of the item’s frequency and uses the mean of the
conditional probability distribution as the calibrated frequency
for the item. It is challenging to estimate the two probability
distributions due to data sparsity. We address the challenge via
integrating techniques from statistics and machine learning. Our
empirical results on two real-world datasets show that Calibrate
significantly outperforms state-of-the-art LDP algorithms for
frequency estimation and heavy hitter identification.
Ieee International Conference on Computer Communications (Infocom)
IEEE International Conference on Computer Communications (INFOCOM)