Sirvi Autor "Pauklin, Juhan" järgi

Nüüd näidatakse 1 - 1 1

listelement.badge.access-status Avatud juurdepääs ,
Terviseandmetel treenitud keelemudelist kontseptsioonide eraldamine
(Tartu Ülikool, 2025) Pauklin, Juhan; Kolde, Raivo, juhendaja; Tartu Ülikool. Loodus- ja täppisteaduste valdkond; Tartu Ülikool. Arvutiteaduse instituut
Language models are capable of performing a wide range of tasks, but how they arrive at their results is like a black box - the user provides input and receives output, but how the output was arrived at is unknown. If the process of network modeling were observable and understandable to humans, then this interpretability would increase confidence in the model's outputs and, in the event of an incorrect model output, understand what went wrong and fix it. In this research, the dictionary learning method using sparse autoencoders was used to study the workflow of a language model, where the autoencoder model separates the neural network activations of the language model into features, which can be viewed as concepts learned by the model. As part of the work, three sparse autoencoders were trained, which differed from each other in the number of features and the given language model layer. The features extracted by the best of the three autoencoders were analyzed and concepts with different degrees of generalization were found, for example, specific health problems affecting the patient, the patient's physical activity, and the positive course of treatment.