DATA PROTECTION: PRIVACY-PRESERVING DATA COLLECTION WITH VALIDATION

Abstract

The ubiquitous data collection has raised potential risks of leaking physical and private attribute information associated with individuals in a collected dataset. A data collector who wants to collect data for provisioning its machine learning (ML)-based services requires establishing a privacy-preserving data collection protocol for data owners. In this work, we design, implement, and evaluate a novel privacy-preserving data collection protocol. Specifically, we validate the functionality of the data collection protocol on behalf of data owners. First, the ML-based services are not always predefined, it is challenging for a data collector to combat inference of private attributes and user identity from the collected data while maintaining the utility of data. To address the challenge, we reconstruct the data by designing a data transformation model based on the autoencoder and clustering. Second, it is necessary to ensure that the reconstructed data satisfy certain privacy-preserving properties as untrusted data collectors can provide the data transformation models. Therefore, we utilize detection models and design an efficient enclave-based mechanism to validate that the reconstructed data’s private attribute estimation probability is bounded by the predefined thresholds. Extensive experiments demonstrate our protocol’s effectiveness, such as significantly reducing the accuracy of private attribute detection.

Let's Talk