Andmebaasi logo
Valdkonnad ja kollektsioonid
Kogu ADA
Eesti
English
Deutsch
  1. Esileht
  2. Sirvi autori järgi

Sirvi Autor "Ginter, Filip" järgi

Tulemuste filtreerimiseks trükkige paar esimest tähte
Nüüd näidatakse 1 - 20 25
  • Tulemused lehekülje kohta
  • Sorteerimisvalikud
  • Laen...
    Pisipilt
    listelement.badge.dso-type Kirje ,
    Applying BLAST to Text Reuse Detection in Finnish Newspapers and Journals, 1771-1910
    (Gothenburg, Linköping University Electronic Press, pp. 54--58, 2017) Vesanto, Aleksi; Nivala, Asko; Rantala, Heli; Salakoski, Tapio; Salmi, Hannu; Ginter, Filip; Bouma, Gerlof; Adesam, Yvonne
  • Laen...
    Pisipilt
    listelement.badge.dso-type Kirje ,
    Building a Large Automatically Parsed Corpus of Finnish
    (Oslo, Norway, Linköping University Electronic Press, Sweden, pp. 291--300, 2013) Ginter, Filip; Nyblom, Jenna; Laippala, Veronika; Kohonen, Samuel; Haverinen, Katri; Vihjanen, Simo; Salakoski, Tapio; Oepen, Stephan; Hagen, Kristin; Johannessen, Janne Bondi
  • Laen...
    Pisipilt
    listelement.badge.dso-type Kirje ,
    Creating register sub-corpora for the Finnish Internet Parsebank
    (Gothenburg, Sweden, Association for Computational Linguistics, pp. 152--161, 2017) Laippala, Veronika; Luotolahti, Juhani; Kyröläinen, Aki-Juhani; Salakoski, Tapio; Ginter, Filip; Tiedemann, Jörg; Tahmasebi, Nina
  • Laen...
    Pisipilt
    listelement.badge.dso-type Kirje ,
    Dep_search: Efficient Search Tool for Large Dependency Parsebanks
    (Gothenburg, Sweden, Association for Computational Linguistics, pp. 255--258, 2017) Luotolahti, Juhani; Kanerva, Jenna; Ginter, Filip; Tiedemann, Jörg; Tahmasebi, Nina
  • Laen...
    Pisipilt
    listelement.badge.dso-type Kirje ,
    Fine-grained Named Entity Annotation for Finnish
    (Reykjavik, Iceland (Online), Linköping University Electronic Press, Sweden, pp. 135--144, 2021) Luoma, Jouni; Chang, Li-Hsin; Ginter, Filip; Pyysalo, Sampo; Dobnik, Simon; Øvrelid, Lilja
  • Laen...
    Pisipilt
    listelement.badge.dso-type Kirje ,
    FinerWeb-10BT: Refining Web Data with LLM-Based Line-Level Filtering
    (University of Tartu Library, 2025-03) Henriksson, Erik; Tarkka, Otto; Ginter, Filip; Johansson, Richard; Stymne, Sara
    Data quality is crucial for training Large Language Models (LLMs). Traditional heuristic filters often miss low-quality text or mistakenly remove valuable content. In this paper, we introduce an LLM-based line-level filtering method to enhance training data quality. We use GPT-4o mini to label a 20,000-document sample from FineWeb at the line level, allowing the model to create descriptive labels for low-quality lines. These labels are grouped into nine main categories, and we train a DeBERTa-v3 classifier to scale the filtering to a 10B-token subset of FineWeb. To test the impact of our filtering, we train GPT-2 models on both the original and the filtered datasets. The results show that models trained on the filtered data achieve higher accuracy on the HellaSwag benchmark and reach their performance targets faster, even with up to 25\% less data. This demonstrates that LLM-based line-level filtering can significantly improve data quality and training efficiency for LLMs. We release our quality-annotated dataset, FinerWeb-10BT, and the codebase to support further work in this area.
  • Laen...
    Pisipilt
    listelement.badge.dso-type Kirje ,
    Finnish Paraphrase Corpus
    (Reykjavik, Iceland (Online), Linköping University Electronic Press, Sweden, pp. 288--298, 2021) Kanerva, Jenna; Ginter, Filip; Chang, Li-Hsin; Rastas, Iiro; Skantsi, Valtteri; Kilpeläinen, Jemina; Kupari, Hanna-Mari; Saarni, Jenna; Sevón, Maija; Tarkka, Otto; Dobnik, Simon; Øvrelid, Lilja
  • Laen...
    Pisipilt
    listelement.badge.dso-type Kirje ,
    Finnish SQuAD: A Simple Approach to Machine Translation of Span Annotations
    (University of Tartu Library, 2025-03) Nuutinen, Emil; Rastas, Iiro; Ginter, Filip; Johansson, Richard; Stymne, Sara
    We apply a simple method to machine translate datasets with span-level annotation using the DeepL MT service and its ability to translate formatted documents. Using this method, we produce a Finnish version of the SQuAD2.0 question answering dataset and train QA retriever models on this new dataset. We evaluate the quality of the dataset and more generally the MT method through direct evaluation, indirect comparison to other similar datasets, a backtranslation experiment, as well as through the performance of downstream trained QA models. In all these evaluations, we find that the method of transfer is not only simple to use but produces consistently better translated data. Given its good performance on the SQuAD dataset, it is likely the method can be used to translate other similar span-annotated datasets for other tasks and languages as well. All code and data is available under an open license: data at HuggingFace TurkuNLP/squad_v2_fi, code on GitHub TurkuNLP/squad2-fi, and model at HuggingFace TurkuNLP/bert-base-finnish-cased-squad2.
  • Laen...
    Pisipilt
    listelement.badge.dso-type Kirje ,
    Is Multilingual BERT Fluent in Language Generation?
    (Turku, Finland, Linköping University Electronic Press, pp. 29--36, 2019) Rönnqvist, Samuel; Kanerva, Jenna; Salakoski, Tapio; Ginter, Filip; Nivre, Joakim and Derczynski, Leon and Ginter, Filip; Lindi, Bjørn; Oepen, Stephan; Søgaard, Anders; Tidemann, Jörg
  • Laen...
    Pisipilt
    listelement.badge.dso-type Kirje ,
    Learning to Extract Biological Event and Relation Graphs
    (2009-05-11T08:58:27Z) Björne, Jari; Ginter, Filip; Heimonen, Juho; Pyysalo, Sampo; Salakoski, Tapio
  • Laen...
    Pisipilt
    listelement.badge.dso-type Kirje ,
    Learning to Extract Biological Event and Relation Graphs
    (Odense, Denmark, Northern European Association for Language Technology (NEALT), pp. 18--25, 2009) Björne, Jari; Ginter, Filip; Heimonen, Juho; Pyysalo, Sampo; Salakoski, Tapio; Jokinen, Kristiina; Bick, Eckhard
  • Laen...
    Pisipilt
    listelement.badge.dso-type Kirje ,
    MULTI-CROSSRE A Multi-Lingual Multi-Domain Dataset for Relation Extraction
    (University of Tartu Library, 2023-05) Bassignana, Elisa; Ginter, Filip; Pyysalo, Sampo; Goot, Rob van der; Plank, Barbara
  • Laen...
    Pisipilt
    listelement.badge.dso-type Kirje ,
    OCR Error Post-Correction with LLMs in Historical Documents: No Free Lunches
    (University of Tartu Library, 2025-03) Kanerva, Jenna; Ledins, Cassandra; Käpyaho, Siiri; Ginter, Filip; Tudor, Crina Madalina; Debess, Iben Nyholm; Bruton, Micaella; Scalvini, Barbara; Ilinykh, Nikolai; Holdt, Špela Arhar
    Optical Character Recognition (OCR) systems often introduce errors when transcribing historical documents, leaving room for post-correction to improve text quality. This study evaluates the use of open-weight LLMs for OCR error correction in historical English and Finnish datasets. We explore various strategies, including parameter optimization, quantization, segment length effects, and text continuation methods. Our results demonstrate that while modern LLMs show promise in reducing character error rates (CER) in English, a practically useful performance for Finnish was not reached. Our findings highlight the potential and limitations of LLMs in scaling OCR post-correction for large historical corpora.
  • Laen...
    Pisipilt
    listelement.badge.dso-type Kirje ,
    Parsing Clinical Finnish: Experiments with Rule-Based and Statistical Dependency Parsers
    (Odense, Denmark, Northern European Association for Language Technology (NEALT), pp. 65--72, 2009) Haverinen, Katri; Ginter, Filip; Laippala, Veronika; Salakoski, Tapio; Jokinen, Kristiina; Bick, Eckhard
  • Laen...
    Pisipilt
    listelement.badge.dso-type Kirje ,
    Parsing Clinical Finnish: Experiments with Rule-Based and Statistical Dependency Parsers
    (2009-05-13T11:07:10Z) Haverinen, Katri; Ginter, Filip; Laippala, Veronika; Salakoski, Tapio
  • Laen...
    Pisipilt
    listelement.badge.dso-type Kirje ,
    Proceedings of the First NLPL Workshop on Deep Learning for Natural Language Processing
    (Turku, Finland, 2019) Nivre, Joakim; Derczynski, Leon; Ginter, Filip; Lindi, Bjørn; Oepen, Stephan; Søgaard, Anders; Tidemann, Jörg
  • Laen...
    Pisipilt
    listelement.badge.dso-type Kirje ,
    Sentence Compression For Automatic Subtitling
    (Vilnius, Lithuania, Linköping University Electronic Press, Sweden, pp. 135--143, 2015) Luotolahti, Juhani; Ginter, Filip; Megyesi, Beáta
  • Laen...
    Pisipilt
    listelement.badge.dso-type Kirje ,
    A System for Identifying and Exploring Text Repetition in Large Historical Document Corpora
    (Gothenburg, Sweden, Association for Computational Linguistics, pp. 330--333, 2017) Vesanto, Aleksi; Ginter, Filip; Salmi, Hannu; Nivala, Asko; Salakoski, Tapio; Tiedemann, Jörg; Tahmasebi, Nina
  • Laen...
    Pisipilt
    listelement.badge.dso-type Kirje ,
    Template-free Data-to-Text Generation of Finnish Sports News
    (Turku, Finland, Linköping University Electronic Press, pp. 242--252, 2019) Kanerva, Jenna; Rönnqvist, Samuel; Kekki, Riina; Salakoski, Tapio; Ginter, Filip; Hartmann, Mareike; Plank, Barbara
  • Laen...
    Pisipilt
    listelement.badge.dso-type Kirje ,
    Towards a Dependency-Based PropBank of General Finnish
    (Oslo, Norway, Linköping University Electronic Press, Sweden, pp. 41--57, 2013) Haverinen, Katri; Laippala, Veronika; Kohonen, Samuel; Missilä, Anna; Nyblom, Jenna; Ojala, Stina; Viljanen, Timo; Salakoski, Tapio; Ginter, Filip; Oepen, Stephan; Hagen, Kristin; Johannessen, Janne Bondi
  • «
  • 1 (current)
  • 2
  • »

DSpace tarkvara autoriõigus © 2002-2025 LYRASIS

  • Teavituste seaded
  • Saada tagasisidet