Find in Library
Search millions of books, articles, and more
Indexed Open Access Databases
A Privacy Preserving Algorithm to Release Sparse High-dimensional Histograms
oleh: Bai Li, Vishesh Karwa, Aleksandra Slavković, Rebecca Carter Steorts
Format: | Article |
---|---|
Diterbitkan: | Labor Dynamics Institute 2018-12-01 |
Deskripsi
Differential privacy has emerged as a popular model to provably limit privacy risks associated with a given data release. However releasing high dimensional synthetic data under differential privacy remains a challenging problem. In this paper, we study the problem of releasing synthetic data in the form of a high dimensional histogram under the constraint of differential privacy. We develop an $(\epsilon, \delta)$-differentially private categorical data synthesizer called \emph{Stability Based Hashed Gibbs Sampler} (SBHG). SBHG works by combining a stability based sparse histogram estimation algorithm with Gibbs sampling and feature selection to approximate the empirical joint distribution of a discrete dataset. SBHG offers a competitive alternative to state-of-the art synthetic data generators while preserving the sparsity structure of the original dataset, which leads to improved statistical utility as illustrated on simulated data. Finally, to study the utility of the resulting synthetic data sets generated by SBHG, we also perform logistic regression using the synthetic datasets and compare the classification accuracy with those from using the original dataset.