Membership inference attack: shadow training method

Definition

[TD.T.2] The Shadow Training Method is an approach used in membership inference attacks where the attacker trains multiple "shadow models" to mimic the behavior of the target machine learning model. The attacker uses these models to understand how the target model might behave differently on data it has seen during training versus data it has not.

Targeted assets

System Asset: ML system input/API.

Business Asset: training data.

Security Criteria: confidentiality.

Attack details

Exploited vulnerabilities

Vulnerabilities:

  1. Model tends to memorize its training data in case of over-parametrization.
  2. It is possible to learn additional information about the training data sample from the target model's output.

Threat agent

Threat agent: white-box and black-box scenarios. In the white-box scenario, the attacker is assumed to have complete knowledge of the target machine learning model, its architecture, parameters, utilized training data, and the learning algorithm. In a black-box scenario, the attacker has no knowledge of the target model's architecture, parameters, or training data. The attacker is assumed to be only able to interact with the model by sending it inputs and observing the outputs.

Attack methods

Attack methods:

  1. Multiple shadow models are built imitating the target model's structure and training process. These shadow models are trained on datasets where the membership status is known. The outputs (e.g., confidence scores) are collected from these shadow models for both member and non-member data. An attack model is then trained to distinguish between members and non-members based on these outputs. The attacker queries the target model with data of unknown membership and uses the attack model to infer membership status against the target model, covering the same domain.

Impact and harm

Impact and harm: Negates the confidentiality of the targeted machine learning model. This may lead to private data leakage and possible legal repercussions.

Security countermeasures

Security requirements

Security requirement: The machine learning system must be resistant to malicious membership inference attacks.

Security controls

Security controls:

  1. Differential privacy: addition of noise to deviate the outputs from the original. Secure multi-party computation: joint computations are conducted within a confidential environment. Homomorphic encryption: calculations are conducted through confidential means, allowing operations on encrypted data without revealing the original data. Adversarial machine learning: incorporation of data about adversarial techniques into the model's training process. Vulnerability detection: risk assessment method for machine learning mode, evolution of the model’s pre-release.