PRESERVING DIFFERENTIAL PRIVACY IN COMPLEX DATA ANALYSIS

Wang, Yue

PRESERVING DIFFERENTIAL PRIVACY IN COMPLEX DATA ANALYSIS

Search for this publication on Google Scholar

Wang, Y. (2015). PRESERVING DIFFERENTIAL PRIVACY IN COMPLEX DATA ANALYSIS. Unc Charlotte Electronic Theses And Dissertations.

Download PDF

Analytics

70 views ◎
37 downloads ⇓

Abstract

Omnipresent databases from various resources, such as social networks, electronic commercial websites, and health related wearable devices, have provided researchers with unprecedented opportunities to analyze complex social phenomena. While society would like to encourage such scientific endeavors, we are faced with the problem of providing researchers with a fairly precise picture of the quantities or trends of complex data without disclosing sensitive information about individuals. In this dissertation, we investigate how to apply differential privacy model in complex data analysis. Differential privacy is a paradigm of post-processing the output of queries or mining tasks on databases such that the inclusion or exclusion of a single individual from a database makes no statistical difference to the results found. It provides formal privacy guarantees that do not depend on an adversary's background knowledge. There has been extensive research on how to enforce differential privacy in analyzing tabular data and several mechanisms have been developed to achieve differential privacy protection. However, there are significant challenges to achieve differential privacy protection on complex data including social networks and biological sequence data mainly due to high sensitivity of desired statistics and the complexity of mining tasks. In this dissertation, we focus on how to enable accurate analysis of complex data while preserving differential privacy. We firstly propose a general divide and conquer framework to deal with complex computation tasks by decomposing a complex target computation into several less complex unit computations connected by basic mathematical operations (e.g., addition, subtraction, multiplication, division), and perturbing the output of each unit with Laplace noise derived from its own sensitivity value and the distributed privacy threshold. Next, we develop solutions to more complicated applications: differential privacy graph generation and differential privacy preserving spectral analysis of network topology. We examine the state-of-the-art differential privacy preserving mechanisms including the exponential mechanism and the smooth sensitivity and develop feasible solutions to these problems. Additionally, we consider the potential information disclosure from differential privacy preserving outputs. We propose two attacking models to show how genome-wide association studies (GWAS) results can be used to infer the trait or the identity of individuals even if those results are under differential privacy protection. We also provide the countermeasure for model inversion attacks where the released regression model under the differential privacy protection can still be exploited by the attacker to derive information about sensitive attributes used in the model. We develop a novel approach for releasing differential private regression models by leveraging the functional mechanism to perturb coefficients of the polynomial representation of the objective function while balancing the privacy budgets for sensitive and non-sensitive attributes in learning the regression models. Our approach can effectively retain the models' utility while preventing model inversion attacks. Finally, we consider the problem of enforcing differential privacy at the client-side against an untrusted server in the data collection scenario. Our proposed technique which uses the randomized response technique incurs less utility loss than the traditional output perturbation mechanism especially when the sensitivity of desired computation is high, and also provides the individuals a simple manner to protect their sensitive information by themselves against anyone with ulterior motives.

Details

Author: Wang, Yue
Title: PRESERVING DIFFERENTIAL PRIVACY IN COMPLEX DATA ANALYSIS
Physical Description: 1 online resource (181 pages) : PDF
Date: 2015
Degree Granting Institution: University of North Carolina at Charlotte
Abstract: Omnipresent databases from various resources, such as social networks, electronic commercial websites, and health related wearable devices, have provided researchers with unprecedented opportunities to analyze complex social phenomena. While society would like to encourage such scientific endeavors, we are faced with the problem of providing researchers with a fairly precise picture of the quantities or trends of complex data without disclosing sensitive information about individuals. In this dissertation, we investigate how to apply differential privacy model in complex data analysis. Differential privacy is a paradigm of post-processing the output of queries or mining tasks on databases such that the inclusion or exclusion of a single individual from a database makes no statistical difference to the results found. It provides formal privacy guarantees that do not depend on an adversary's background knowledge. There has been extensive research on how to enforce differential privacy in analyzing tabular data and several mechanisms have been developed to achieve differential privacy protection. However, there are significant challenges to achieve differential privacy protection on complex data including social networks and biological sequence data mainly due to high sensitivity of desired statistics and the complexity of mining tasks. In this dissertation, we focus on how to enable accurate analysis of complex data while preserving differential privacy. We firstly propose a general divide and conquer framework to deal with complex computation tasks by decomposing a complex target computation into several less complex unit computations connected by basic mathematical operations (e.g., addition, subtraction, multiplication, division), and perturbing the output of each unit with Laplace noise derived from its own sensitivity value and the distributed privacy threshold. Next, we develop solutions to more complicated applications: differential privacy graph generation and differential privacy preserving spectral analysis of network topology. We examine the state-of-the-art differential privacy preserving mechanisms including the exponential mechanism and the smooth sensitivity and develop feasible solutions to these problems. Additionally, we consider the potential information disclosure from differential privacy preserving outputs. We propose two attacking models to show how genome-wide association studies (GWAS) results can be used to infer the trait or the identity of individuals even if those results are under differential privacy protection. We also provide the countermeasure for model inversion attacks where the released regression model under the differential privacy protection can still be exploited by the attacker to derive information about sensitive attributes used in the model. We develop a novel approach for releasing differential private regression models by leveraging the functional mechanism to perturb coefficients of the polynomial representation of the objective function while balancing the privacy budgets for sensitive and non-sensitive attributes in learning the regression models. Our approach can effectively retain the models' utility while preventing model inversion attacks. Finally, we consider the problem of enforcing differential privacy at the client-side against an untrusted server in the data collection scenario. Our proposed technique which uses the randomized response technique incurs less utility loss than the traditional output perturbation mechanism especially when the sensitivity of desired computation is high, and also provides the individuals a simple manner to protect their sensitive information by themselves against anyone with ulterior motives.
Genre: doctoral dissertations
Subjects--Topics: Computer science
Data sets
Degree: Ph.D.
Keywords: Differential Privacy
Subject Area: Information Technology
Advisor(s): Wu, Xintao
Committee Members: Ge, Yong
Ras, Zbigniew
Yan, Shan
Zheng, Yuliang
Degree Note: Thesis (Ph.D.)--University of North Carolina at Charlotte, 2015.
Rights Statement: This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s). For additional information, see http://rightsstatements.org/page/InC/1.0/.
Rights Holder Information: Copyright is held by the author unless otherwise indicated.
Identifier: Wang_uncc_0694D_10777
Permalink: http://hdl.handle.net/20.500.13093/etd:1562

J. Murrey Atkins Library

J. Murrey Atkins Library