Definition
[TD.T.6] A poisoning DoS attack is a type of adversarial attack where an attacker manipulates the training data of a machine learning (ML) model with the explicit goal of disrupting the system's availability. This involves injecting malicious data or altering existing data into the training set to degrade the model's performance and cause a denial of service.
Targeted assets
System Asset: machine learning training system.
Business Asset: training data.
Security Criteria: availability.
Attack details
Exploited vulnerabilities
Vulnerabilities:
- The training dataset is susceptible to unauthorized modifications.
- The public data that is utilized for training may contain malicious samples.
Threat agent
Threat agent: white-box and black-box scenarios. In the white-box scenario, the attacker is assumed to have complete knowledge of the target machine learning model, its architecture, parameters, utilized training data, and the learning algorithm. In a black-box scenario, the attacker has no knowledge of the target model's architecture, parameters, or training data. The attacker is assumed to be only able to interact with the model by sending it inputs and observing the outputs.
Attack methods
Attack methods:
- The attacker modifies the target dataset features and labels to maximize the loss function for targeted samples.
Impact and harm
Impact and harm: Negates the availability of the targeted machine learning model. This leads to misclassification of malicious input.
Security countermeasures
Security requirements
Security requirement: The machine learning system must be resistant to poisoning attacks.
Security controls
Security controls:
- Reject On Negative Impact (RONI) defense detect and discards samples within the training dataset that have a negative impact on the classifier's accuracy. This technique is computationally very expensive. The method may be susceptible to overfitting, reducing its performance, when operated on small training dataset, compared to the amount of features.
- Combine outlier detection with optimization techniques to correlate classifier predictions with labels. This method requires prior knowledge on the fraction of the poisoned samples.
- Utilize a small, curated, and verified subset of trusted data points to train outlier detectors for each class. This method requires curation of trusted data.
- Relabel potentially malicious data points based on their k-Nearest Neighbors in the feature space. This method is inefficient if malicious samples are close to genuine data.
- Setup an influence function that estimates the influence of each training sample on the model’s predictions.
- Detect and remove outliers, pre-filter the training dataset.