Margins in Contrastive Learning: Evaluating Multi-task Retrieval for Sentence Embeddings

dc.contributor.authorJørgensen, Tollef Emil
dc.contributor.authorBreitung, Jens
dc.contributor.editorJohansson, Richard
dc.contributor.editorStymne, Sara
dc.coverage.spatialTallinn, Estonia
dc.date.accessioned2025-02-18T09:08:44Z
dc.date.available2025-02-18T09:08:44Z
dc.date.issued2025-03
dc.description.abstractThis paper explores retrieval with sentence embeddings by fine-tuning sentence-transformer models for classification while preserving their ability to capture semantic similarity. To evaluate this balance, we introduce two opposing metrics – polarity score and semantic similarity score – that measure the model's capacity to separate classes and retain semantic relationships between sentences. We propose a system that augments supervised datasets with contrastive pairs and triplets, training models under various configurations and evaluating their performance on top-$k$ sentence retrieval. Experiments on two binary classification tasks demonstrate that reducing the margin parameter of loss functions greatly mitigates the trade-off between the metrics. These findings suggest that a single fine-tuned model can effectively handle joint classification and retrieval tasks, particularly in low-resource settings, without relying on multiple specialized models.
dc.identifier.urihttps://hdl.handle.net/10062/107219
dc.language.isoen
dc.publisherUniversity of Tartu Library
dc.relation.ispartofseriesNEALT Proceedings Series, No. 57
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/
dc.titleMargins in Contrastive Learning: Evaluating Multi-task Retrieval for Sentence Embeddings
dc.typeArticle

Failid

Originaal pakett

Nüüd näidatakse 1 - 1 1
Laen...
Pisipilt
Nimi:
2025_nodalida_1_28.pdf
Suurus:
331.31 KB
Formaat:
Adobe Portable Document Format