The UML class diagram visualizes a threat model with 7 threats, determined from the conducted systematic literature review, which target the training data for the initial compromise. The compromise of the training data is conducted either through “ML system input/API”, “Machine learning training system” system assets or by targeting the “Machine learning model” itself.
Training data: datasets utilized to train, re-train or fine-tune the target machine learning model. In the context of LLM’s, this could be a large collection of textual data. Utilized by the training process to train the model. LLM’s can be trained in a two stage process, initially the model is pre-trained on the general-purpose datasets. Afterwards, the model is fine-tuned on specific datasets, fitting to the model’s purpose.