Fingerprinting attack

Definition

[IP.T.1] A fingerprinting attack aims to uniquely identify a specific machine learning model instance or to determine which model or family of models is being used in a black box setting. The goal is to derive a signature that is unique to a particular model, similar to human fingerprint biometry.

Targeted assets

System Asset: ML system input/API.

Business Asset: intellectual property (IP).

Security Criteria: confidentiality.

Attack details

Exploited vulnerabilities

Vulnerabilities:

  1. The model’s unique decision boundaries and output patterns serve as a “fingerprint”; benign inputs reveal characteristic outputs.

Threat agent

Threat agent: black-box scenarios. In a black-box scenario, the attacker has no knowledge of the target model's architecture, parameters, or training data. The attacker is assumed to be only able to interact with the model by sending it inputs and observing the outputs.

Attack methods

Attack methods:

  1. A set of benign queries are sent to the target set of models. The responses from models are compared to the known response from the targeted model for statistical similarity.

Impact and harm

Impact and harm: Negates the confidentiality of the targeted machine learning model.

Security countermeasures

Security requirements

Security requirement: The machine learning system must be resistant to fingerprinting attacks.

Security controls

Security controls:

  1. Randomized smoothing - addition of noise to the input, output classes are aggregated.
  2. Pruning of the last layer.