Subsampled exponential mechanism: Differential privacy in large output spaces
In the last several years, differential privacy has become the leading framework for private data analysis. It provides bounds on the amount that a randomized function can change as the result of a modification to one record of a database. This requirement can be satisfied by using the exponential mechanism to perform a weighted choice among the possible alternatives, with better options receiving higher weights. However, in some situations the number of possible outcomes is too large to compute all weights efficiently. We present the subsampled exponential mechanism, which scores only a sample of the outcomes. We show that it still preserves differential privacy, and fulfills a similar accuracy bound. Using a clustering application, we show that the subsampled exponential mechanism outperforms a previously published private algorithm and is comparable to the full exponential mechanism but more scalable.