Privacy-Preserving Data Synthesis Using Trusted Execution Environments
Date
2022
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Tartu Ülikool
Abstract
Data synthesis is the process of generating new synthetic data from existing data. Often
companies do not have the the in-house competence to synthesize data themselves, and
are willing to outsource the process. However, synthesis requires access to the original
data. Sharing data with a third party can be complex, especially so if it contains sensitive
information or is considered as personal data by regulations such as the GDPR.
The goal of this thesis is to develop a proof-of-concept privacy-preserving data
synthesis service showing that it is possible to use trusted execution environments to
perform data synthesis in a privacy-preserving manner. Such a service would enable
outsourcing the data synthesis process to an untrusted remote server by ensuring that
both the original and synthesized data are fully hidden from the untrusted server host
throughout the lifecycle of the service.
A prototype of the service was developed in the scope of an ongoing proof-of-concept
project. To achieve the required security goals the service prototype uses trusted execution
environment technologies, specifically the Sharemind HI development platform, which is
in turn based on the Intel SGX platform. The developed service shows that synthesizing
data in a privacy-preserving manner is indeed feasible if trusted execution environments
are used. However, future work is needed to optimize the service to allow larger input
and output files, and to support additional data synthesis methods.
Description
Keywords
Data synthesis, trusted execution environments, privacy-preserving technologies