[MP.T.3] A model manipulation attack involves an adversary directly altering the parameters, logic, or architecture of a machine learning model with the intent to compromise its performance, security, or integrity. This type of attack differs from traditional adversarial attacks that focus on crafting malicious input samples or poisoning training data. Instead, the adversary gains access to the model itself and modifies it to achieve specific malicious goals.
System Asset: machine learning model.
Business Asset: model's parameters.
Security Criteria: integrity.
Vulnerabilities:
Threat agent: white-box and black-box scenarios. In the white-box scenario, the attacker is assumed to have complete knowledge of the target machine learning model, its architecture, parameters, utilized training data, and the learning algorithm. In a black-box scenario, the attacker has no knowledge of the target model's architecture, parameters, or training data. The attacker is assumed to be only able to interact with the model by sending it inputs and observing the outputs.
Attack methods:
Impact and harm: Negates the integrity of the targeted machine learning model. This leads to a reduction in model's accuracy.
Security requirement: The machine learning system's actions and decisions must be resistant to model manipulation attacks.
Security controls: