PrivGene

This work focuses on analysis tasks that involve model ﬁtting, i.e., ﬁnding the parameters of a statistical model that best ﬁt the dataset. For such tasks, the quality of the differentially private results depends upon both the effectiveness of the model ﬁtting algorithm, and the amount of perturbations required to satisfy the privacy guarantees. Most previous studies start from a state-of-the-art, non-private model ﬁtting algorithm, and develop a differentially private version. Unfortunately, many model ﬁtting algorithms require intensive perturbations to satisfy ε-differential privacy, leading to poor overall result quality.

Motivated by this, we propose PrivGene, a general-purpose differentially private model ﬁtting solution based on genetic algorithms (GA). PrivGene needs signiﬁcantly less perturbations than previous methods, and it achieves higher overall result quality, even for model ﬁtting tasks where GA is not the ﬁrst choice without privacy considerations. Further, PrivGene performs the random perturbations using a novel technique called the enhanced exponential mechanism, which improves over the exponential mechanism by exploiting the special properties of model ﬁtting tasks. As case studies, we apply PrivGene to three common analysis tasks involving model ﬁtting: logistic regression, SVM classiﬁcation, and kmeans clustering. Extensive experiments using real data conﬁrm the high result quality of PrivGene, and its superiority over existing methods.

Publication

J. Zhang, X. Xiao, Y. Yang, Z. Zhang and M. Winslett. PrivGene: Differentially Private Model fitting Using Genetic Algorithms. SIGMOD, 2013.