Sirvi Autor "Pyysalo, Sampo" järgi
Nüüd näidatakse 1 - 10 10
- Tulemused lehekülje kohta
- Sorteerimisvalikud
Kirje Fine-grained Named Entity Annotation for Finnish(Reykjavik, Iceland (Online), Linköping University Electronic Press, Sweden, pp. 135--144, 2021) Luoma, Jouni; Chang, Li-Hsin; Ginter, Filip; Pyysalo, Sampo; Dobnik, Simon; Øvrelid, LiljaKirje Learning to Extract Biological Event and Relation Graphs(2009-05-11T08:58:27Z) Björne, Jari; Ginter, Filip; Heimonen, Juho; Pyysalo, Sampo; Salakoski, TapioKirje Learning to Extract Biological Event and Relation Graphs(Odense, Denmark, Northern European Association for Language Technology (NEALT), pp. 18--25, 2009) Björne, Jari; Ginter, Filip; Heimonen, Juho; Pyysalo, Sampo; Salakoski, Tapio; Jokinen, Kristiina; Bick, EckhardKirje MULTI-CROSSRE A Multi-Lingual Multi-Domain Dataset for Relation Extraction(University of Tartu Library, 2023-05) Bassignana, Elisa; Ginter, Filip; Pyysalo, Sampo; Goot, Rob van der; Plank, BarbaraKirje Poro 34B and the Blessing of Multilinguality(University of Tartu Library, 2025-03) Luukkonen, Risto; Burdge, Jonathan; Zosa, Elaine; Talman, Aarne; Komulainen, Ville; Hatanpää, Väinö; Sarlin, Peter; Pyysalo, Sampo; Johansson, Richard; Stymne, SaraThe pretraining of state-of-the-art large language models now requires trillions of words of text, which is orders of magnitude more than available for the vast majority of languages. While including text in more than one language is an obvious way to acquire more pretraining data, multilinguality is often seen as a curse, and most model training efforts continue to focus near-exclusively on individual large languages. We believe that multilinguality can be a blessing: when the lack of training data is a constraint for effectively training larger models for a target language, augmenting the dataset with other languages can offer a way to improve over the capabilities of monolingual models for that language. In this study, we introduce Poro 34B, a 34 billion parameter model trained for 1 trillion tokens of Finnish, English, and programming languages, and demonstrate that a multilingual training approach can produce a model that substantially advances over the capabilities of existing models for Finnish and excels in translation, while also achieving competitive performance in its class for English and programming languages. We release the model parameters, scripts, and data under open licenses at https://huggingface.co/LumiOpen/Poro-34B.Kirje Toward Multilingual Identification of Online Registers(Turku, Finland, Linköping University Electronic Press, pp. 292--297, 2019) Laippala, Veronika; Kyllönen, Roosa; Egbert, Jesse; Biber, Douglas; Pyysalo, Sampo; Hartmann, Mareike; Plank, BarbaraKirje Towards the Classification of the Finnish Internet Parsebank: Detecting Translations and Informality(Vilnius, Lithuania, Linköping University Electronic Press, Sweden, pp. 107--116, 2015) Laippala, Veronika; Kanerva, Jenna; Missilä, Anna; Pyysalo, Sampo; Salakoski, Tapio; Ginter, Filip; Megyesi, BeátaKirje Toxicity Detection in Finnish Using Machine Translation(University of Tartu Library, 2023-05) Eskelinen, Anni; Silvala, Laura; Ginter, Filip; Pyysalo, Sampo; Laippala, VeronikaKirje Universal Dependencies for Finnish(Vilnius, Lithuania, Linköping University Electronic Press, Sweden, pp. 163--172, 2015) Pyysalo, Sampo; Kanerva, Jenna; Missilä, Anna; Laippala, Veronika; Ginter, Filip; Megyesi, BeátaKirje WikiBERT Models: Deep Transfer Learning for Many Languages(Reykjavik, Iceland (Online), Linköping University Electronic Press, Sweden, pp. 1–10, 2021) Pyysalo, Sampo; Kanerva, Jenna; Virtanen, Antti; Ginter, Filip; Dobnik, Simon; Øvrelid, Lilja