Machine learning training system: a combination of processes and procedures, hardware and software infrastructure, which is dedicated to the training, re-training, fine-tuning and creation of the machine learning model, updating model’s parameters in the process. The system may involve training data preprocessing, its collection, configuration of optimization algorithms and evaluation of the trained model’s performance against testing datasets. There may be multiple “*” training systems, dedicated to training one model.
Defined methods:
- collectTrainingData() – collection of the training data, suitable for the purposes of the machine learning model. The data may be taken from the prepared public data sets, collected through web resources with web crawlers or from internal services.
- preProcessTrainingData() – the training data is filtered and normalized to fit the training of the model, according to the model’s design.
- trainMachineLearningModel() – the process of training the model.
- testAndRefineMLModel() – the process of testing intermediary model iterations and continued training.
- continuouslyTuneModel() – the new iterations of training of finalized models to incorporate the new data and model design changes.
Business asset, related to the association with the machine learning model:
- Training data: datasets utilized to train, re-train or fine-tune the target machine learning model. In the context of LLM’s, this could be a large collection of textual data. Utilized by the training process to train the model. LLM’s can be trained in a two stage process, initially the model is pre-trained on the general-purpose datasets. Afterwards, the model is fine-tuned on specific datasets, fitting to the model’s purpose.