Definition
[MOD.T.2] A malicious hardware fault injection attack is a type of hardware-oriented security threat where an adversary intentionally introduces faults or errors into the physical hardware on which a machine learning model is running. This is done to compromise the model's integrity, leading to misclassification or other undesirable behaviors. Unlike software-based attacks, hardware fault injections directly manipulate the ML model's parameters and computation results by tampering with the inference process without manipulating the sample or training data.
Targeted assets
System Asset: processing hardware running the ML model.
Business Asset: model's operational data.
Security Criteria: integrity.
Attack details
Exploited vulnerabilities
Vulnerabilities:
- The state of machine learning model's operational data can be influenced through hardware state manipulation with fault injection.
Threat agent
Threat agent: white-box and black-box scenarios. In the white-box scenario, the attacker is assumed to have complete knowledge of the target machine learning model, its architecture, parameters, utilized training data, and the learning algorithm. In a black-box scenario, the attacker has no knowledge of the target model's architecture, parameters, or training data. The attacker is assumed to be only able to interact with the model by sending it inputs and observing the outputs.
Attack methods
Attack methods:
- A malicious fault is introduced into the operations of hardware, processing target machine learning model's operations.
Impact and harm
Impact and harm: Negates the integrity of the targeted machine learning model. This may lead to misclassification of benign and malicious inputs.
Security countermeasures
Security requirements
Security requirement: The machine learning system must be resistant to malicious hardware fault injection attacks.
Security controls
Security controls:
- Binarization method: mimic bit-flip noise on the weights, thus increasing robustness against bit-flip attacks.
- Piece-wise clustering method: adds fixed single bit-width constraint during the training, this increasing robustness against bit-flip attacks.
- Weight reconstruction: averages errors over a grain of weights with their quantization and clipping, thus increasing robustness against bit-flip attacks.
- Defensive quantization: constrains the Lipschitz constant during training to limit mapping sensitivity.
- Hardware with Triple Modular Redundancy (TMR): three copies of the functional circuits are present, majority vote determines correction and masking of faults in copied; imposes higher energy an resource overhead.
- DNN (Deep Neural Networks) accelerator: tolerant to SRAM read faults from voltage variations.
- Word masking and bit masking: round faulty bits to zero; a whole word reset to zero or flipped bits are rest to zero respectively.
- TE-Drop: an error-tolerant design for the MAC units, for example utilizing Razor flip flops module for active fault detection; the detected error is dropped. Hardening of selective memory cells. Application of modular redundancy on sensitive weights. Possible direction - explainable AI, if both inductive and deductive reasonings are incorporated together, this could reduce frequency of logical fallacies.
- Trusted Inference Engine (TIE): Pseudo Random Number Generators (PRNG) and PUF (Physically Unclonable Function) are utilized to decrypt the encrypted machine learning model, stored on off-chip memory. A DNN accelerator with a memory encryption engine, encrypting data in DRAM; also utilizing Integrity Verification (IV) engine for detection of unauthorized operations on the data from the external memory; comes with low overhead (this defense does not account for hardware side-channels). Possible direction - explainable AI, if both inductive and deductive reasonings are incorporated together, this could reduce frequency of logical fallacies.