Relationship of attacks and training data asset

General Image Description

The UML class diagram visualizes a threat model with 7 threats, determined from the conducted systematic literature review, which target the training data for the initial compromise. The compromise of the training data is conducted either through “ML system input/API”, “Machine learning training system” system assets or by targeting the “Machine learning model” itself.

Training data: datasets utilized to train, re-train or fine-tune the target machine learning model. In the context of LLM’s, this could be a large collection of textual data. Utilized by the training process to train the model. LLM’s can be trained in a two stage process, initially the model is pre-trained on the general-purpose datasets. Afterwards, the model is fine-tuned on specific datasets, fitting to the model’s purpose.

List of threats

  1. [TD.T.1] A model inversion attack is a type of privacy attack where an adversary aims to reconstruct training samples from a machine learning model by exploiting the model's outputs. The goal is to infer specific features or attributes of the hidden input data used to train the model. This type of attack allows an adversary to directly learn information about the training dataset.
  2. [TD.T.4] A membership inference attack is a type of privacy attack where an adversary tries to determine whether a specific data record was part of the training dataset of a machine learning model. It exploits the tendency of ML models to behave differently on data they have been trained on compared to unseen data. A successful MIA signifies that the privacy of the training data was not sufficiently protected when the trained ML model is released.
    1. [TD.T.2] The Shadow Training Method is an approach used in membership inference attacks where the attacker trains multiple "shadow models" to mimic the behavior of the target machine learning model. The attacker uses these models to understand how the target model might behave differently on data it has seen during training versus data it has not.
    2. [TD.T.3] A metric-based membership inference attack is an approach where the attacker calculates a metric on the prediction vectors of a data record and compares it to a predetermined threshold to determine its membership status. This type of attack is generally simpler and less computationally expensive than shadow model-based attacks.
  3. [TD.T.5] Poisoning attacks involve an adversary compromising a machine learning (ML) model by manipulating the training data. The attacker injects malicious data into the training dataset or alters the original training data. The high-level goal is to maximize the generalization error in the classification process or reduce the system’s performance. These attacks occur during the training process, aiming to shift the decision boundaries of classifiers.
    1. [TD.T.6] A backdoor attack is a specific type of poisoning attack where adversaries modify the labels of training samples and inject these mislabeled data with backdoor triggers into the training dataset. The goal is to force the trained model to assign a desired target label to new samples containing the trigger.
    2. [TD.T.7] A poisoning DoS attack is a type of adversarial attack where an attacker manipulates the training data of a machine learning (ML) model with the explicit goal of disrupting the system's availability. This involves injecting malicious data or altering existing data into the training set to degrade the model's performance and cause a denial of service.