Relationship of attacks and model’s operational data asset

General Image Description

The UML class diagram visualizes a threat model with 2 threats, determined from the conducted systematic literature review, which target the model's operational data for the initial compromise. The compromise of the model's operational data is conducted either through “Processing hardware running the ML model”.

Model’s operational data: transient and intermediate data and calculations generated and executed upon, during the model’s operational, inference stage like target model’s parameters. The operational data is held in system memory during model’s operations, it may contain temporary results and sensitive information. In case of LLM’s, the calculations for the preprocessed input query involve calculations across model’s layers, attention mechanisms, processing of token embeddings.

List of threats

  1. [MOD.T.1] A Hardware Trojan (HT) attack is a type of attack where malicious circuitry is covertly inserted into the hardware that implements a machine learning model. This hidden circuitry, once triggered, can manipulate the model's behavior, leading to misclassification, data leakage, or other security breaches. Unlike software-based attacks, HT attacks directly target the hardware to alter the model's parameters and computation results.
  2. [MOD.T.2] A malicious hardware fault injection attack is a type of hardware-oriented security threat where an adversary intentionally introduces faults or errors into the physical hardware on which a machine learning model is running. This is done to compromise the model's integrity, leading to misclassification or other undesirable behaviors. Unlike software-based attacks, hardware fault injections directly manipulate the ML model's parameters and computation results by tampering with the inference process without manipulating the sample or training data.