Definition
[TD.T.1] A model inversion attack is a type of privacy attack where an adversary aims to reconstruct training samples from a machine learning model by exploiting the model's outputs. The goal is to infer specific features or attributes of the hidden input data used to train the model. This type of attack allows an adversary to directly learn information about the training dataset.
Targeted assets
System Asset: ML system input/API.
Business Asset: training data.
Security Criteria: confidentiality.
Attack details
Exploited vulnerabilities
Vulnerabilities:
- Model tends to memorize its training data in case of over-parametrization. It is possible to learn additional information about the training data sample from the target model's output.
- It is possible to recover the features of the training data based of predictions retrieved from the target machine learning model.
- Model tends to memorize its training data in case of over-parametrization.
- It is possible to extract confidential data from the trained machine learning system, trained on confidential data.
- It is possible to learn additional information about the training data sample from the target model's output.
Threat agent
Threat agent: white-box and black-box scenarios. In the white-box scenario, the attacker is assumed to have complete knowledge of the target machine learning model, its architecture, parameters, utilized training data, and the learning algorithm. In a black-box scenario, the attacker has no knowledge of the target model's architecture, parameters, or training data. The attacker is assumed to be only able to interact with the model by sending it inputs and observing the outputs.
Attack methods
Attack methods:
- One of the methods: outputs of different model versions are recorded; Multi-Layer Perceptron is utilized to analyze the difference between the version outputs, producing information about the target training data. Other possible techniques: LSB encoding; Correlated value encoding; Sign encoding, Using Model Overfitting, Neuron sorting; Set-based representation, Training-based strategy values, Recognition related neuron, Training attack classifier, Training Shadow GAN, Poisoning attack, Analysis of the confidence score , Training a meta-classifier, Design a multi-task GAN, Regularized Maximum Likelihood Estimation; Inverse-Network, Deep Leakage from Gradients, Numerical reconstruction Matching virtual and shared gradients, Equality solving; Path Restriction, Direct/passive/active label inference attack.
- 1. A custom feature extractor is trained upon auxiliary data. 2. The public, auxiliary data is fed to the feature extractor and the target model to produce feature-prediction probability pairs. 3. An inverse model is trained on the feature-prediction probability data, mapping prediction data to the feature space. 5. The inverse model produces a feature based on the provided label data. 4. Output of a GAN model (trained on public, auxiliary data) is produced based on the inverse model's generated features. 5. The produced sample by GAN is fed to the target model to acquire new predictions. 6. Features are extracted from the generated image and compared to the features previously produced by the inverse model, accounting for the new predictions. 7. Thus, the features are updated iteratively with differential evolution (DE) optimization algorithm. 8. A new image is generated with the GAN model based on the updated features.
Impact and harm
Impact and harm: Negates the confidentiality of the targeted machine learning model. This may lead to leakage of sensitive data. Exposure of sensitive private data may lead to legal repercussions.
Security countermeasures
Security requirements
Security requirement: The machine learning system must be resistant to model inversion attacks.
Security controls
Security controls:
- Degradation of precision accuracy. This may result in slight degradation of the attack's performance.
- The prediction results can be rounded down or replaced with null data based on the threshold. This results in degradation of the attack's performance.
- Differential privacy: addition of noise to deviate the outputs from the original.
- Secure multi-party computation: joint computations are conducted within confidential environment.
- Homomorphic encryption: calculations are conducted through confidential means, allowing operations on encrypted data without revealing the original data.
- Adversarial machine learning: incorporation of data about adversarial techniques into the model's training process.
- Differential-privacy-based method obscures the input by adding noise to the original data model. Possible methods: randomized aggregative privacy-preserving ordinal response (RAPPOR) method, PATE (Private Aggregation of Teacher Ensembles). The PATE method trains a teacher model on disjoint subsets of data and then a student model is trained on the teacher's output.
- Homomorphic-encryption-based method allows for substantial amount of important data to be securely transmitted in a cloud environment. Alternatively, a CryptoNets model was developed, which encrypts the model parameters.
- Vulnerability detection: risk assessment method for machine learning mode, evolution of the model’s pre-release.