Adversarial attack: Man-in-the-Middle attack

Definition

[IT.T.12] A Man-in-the-Middle (MitM) attack in the context of machine learning is a type of adversarial attack where an attacker stealthily intercepts and alters the communication between two parties (e.g., a data source and a machine learning classifier) to deliver malicious payloads or manipulate the data, with the aim of compromising the integrity or availability of the ML system.

Targeted assets

System Asset: Machine learning model.

Business Asset: input data.

Security Criteria: integrity.

Attack details

Exploited vulnerabilities

Vulnerabilities:

  1. There exist special inputs that are close to correctly classified samples but are completely misclassified by a machine learning model.

Threat agent

Threat agent: white-box and black-box scenarios. In the white-box scenario, the attacker is assumed to have complete knowledge of the target machine learning model, its architecture, parameters, utilized training data, and the learning algorithm. In a black-box scenario, the attacker has no knowledge of the target model's architecture, parameters, or training data. The attacker is assumed to be only able to interact with the model by sending it inputs and observing the outputs.

Attack methods

Attack methods:

  1. 1. Train a Variational Autoencoder (VAE) with a malicious decoder (MVD). 2. MVD is fine-tuned to produce adversarial samples. 3. Integrate the VAE with the trained MVD into the chain either between raw input data source and the classifier, or swapping out the existing decoder with MVD from the present VAE. 4. As data passes through, the MVD transforms encoded representation of the input data into an adversarial example.

Impact and harm

Impact and harm: Negates the integrity of the targeted machine learning model. This leads to misclassification of benign and malicious inputs.

Security countermeasures

Security requirements

Security requirement: The machine learning system must be resistant to adversarial attacks.

Security controls

Security controls:

  1. Randomization - a technique, randomizing model's input.
  2. Adversarial training: the model is trained against adversarial samples, to increase it robustness.