Membership inference attack: metric-based method

Definition

[TD.T.3] A metric-based membership inference attack is an approach where the attacker calculates a metric on the prediction vectors of a data record and compares it to a predetermined threshold to determine its membership status. This type of attack is generally simpler and less computationally expensive than shadow model-based attacks.

Targeted assets

System Asset: ML system input/API.

Business Asset: training data.

Security Criteria: confidentiality.

Attack details

Exploited vulnerabilities

Vulnerabilities:

  1. Model tends to memorize its training data in case of over-parametrization. It is possible to learn additional information about the training data sample from the target model's output.

Threat agent

Threat agent: white-box and black-box scenarios. In the white-box scenario, the attacker is assumed to have complete knowledge of the target machine learning model, its architecture, parameters, utilized training data, and the learning algorithm. In a black-box scenario, the attacker has no knowledge of the target model's architecture, parameters, or training data. The attacker is assumed to be only able to interact with the model by sending it inputs and observing the outputs.

Attack methods

Attack methods:

  1. Observer target machine learning model's behavior on training data versus unseen data. Analysis of metrics such as confidence scores, loss values, or entropy, patterns indicative of membership are identified. Statistical thresholds or anomalies in the metrics are utilized to determine the membership of the target sample. Membership is inferred when the metrics for a given input differ significantly from those expected for non-members.

Impact and harm

Impact and harm: Negates the confidentiality of the targeted machine learning model. This may lead to private data leakage and possible legal repercussions.

Security countermeasures

Security requirements

Security requirement: The machine learning system must be resistant to malicious membership inference attacks.

Security controls

Security controls:

  1. Differential privacy: addition of noise to deviate the outputs from the original. Secure multi-party computation: joint computations are conducted within a confidential environment. Homomorphic encryption: calculations are conducted through confidential means, allowing operations on encrypted data without revealing the original data. Adversarial machine learning: incorporation of data about adversarial techniques into the model's training process. Vulnerability detection: risk assessment method for machine learning mode, evolution of the model’s pre-release.