Andmebaasi logo
Valdkonnad ja kollektsioonid
Kogu ADA
Eesti
English
Deutsch
  1. Esileht
  2. Sirvi kuupäeva järgi

Sirvi Kuupäev , alustades "2025-11" järgi

Filtreeri tulemusi aasta või kuu järgi
Nüüd näidatakse 1 - 20 35
  • Tulemused lehekülje kohta
  • Sorteerimisvalikud
  • Laen...
    Pisipilt
    listelement.badge.dso-type Kirje ,
    Structuring intermediary roles in innovation policy mixes: a multi-country policy-level analysis of coordination and performance
    (Tartu Ülikool, 2025-11) Huang, Xuxua; Ukrainski, Kadri, juhendaja; Tartu Ülikool. Majandusteaduskond; Tartu Ülikool. Sotsiaalteaduste valdkond
  • Laen...
    Pisipilt
    listelement.badge.dso-type Kirje ,
    Does board gender diversity enhance firm performance in Estonia?
    (Tartu Ülikool, 2025-11) Tang, Yiling; Masso, Jaan, juhendaja; Tartu Ülikool. Majandusteaduskond; Tartu Ülikool. Sotsiaalteaduste valdkond
  • Laen...
    Pisipilt
    listelement.badge.dso-type Kirje ,
    The interplay between intelligence and health in shaping income: evidence from Poland
    (Tartu Ülikool, 2025-11) Feng, Liqi; Hirv, Tanel, juhendaja; Tartu Ülikool. Majandusteaduskond; Tartu Ülikool. Sotsiaalteaduste valdkond
  • Laen...
    Pisipilt
    listelement.badge.dso-type Kirje ,
    The Word Rain visualisation technique applied to digital history: How to visualise, explore and compare texts using semantically structured word Clouds
    (University of Tartu Library, 2025-11) Skeppstedt, Maria; Ahltorp, Magnus; Kucher, Kostiantyn; Aangenendt, Gijs; Lindström, Matts; Söderfeldt, Ylva; Bouma, Gerlof; Dannélls, Dana; Kokkinakis, Dimitrios; Volodina, Elena
    The Word Rain text visualisation technique aims to retain the simplicity of the classic word cloud, while addressing some of its limitations. In particular, the Word Rain visualisation uses word embeddings to automatically give the visualised words a semantically meaningful position along the horizontal axis. In this handbook chapter, we showcase how this novel approach for word positioning makes the Word Rain technique suitable for exploring, analysing and comparing texts. More specifically, we show how the Word Rain Python module can be used to visualise longitudinal changes in periodicals published by the Swedish Diabetes Association, and how the Word Rain web service can be used to create visualisations that compare the patient organisation periodicals to journals published by the Swedish Medical Association.
  • Laen...
    Pisipilt
    listelement.badge.dso-type Kirje ,
    SweLL with pride: How to put a learner corpus to good use
    (University of Tartu Library, 2025-11) Volodina, Elena; Masciolini, Arianna; Megyesi, Beáta; Prentice, Julia; Rudebeck, Lisa; Sundberg, Gunlög; Wirén, Mats; Bouma, Gerlof; Dannélls, Dana; Kokkinakis, Dimitrios; Volodina, Elena
    Second language (L2) learner corpora are collections of language samples that demonstrate learners’ abilities to perform some learning tasks, e.g. an ability to write essays, answer to reading comprehension questions, or talk on a given topic. Such corpora are necessary for both empirical-based research within Second Language Acquisition (SLA), and for development of methods for automatic processing of such data. L2 corpora are notoriously difficult to collect, and their value depends to a greater degree on the representativeness and balance of the sampled data, type of associated metadata and reliability of manual annotations. In this chapter we thoroughly describe the SweLL-gold corpus of L2 Swedish, its annotation, statistics and metadata, and showcase main types of its use, such as (1) in research on SLA through detailed instructions on how to perform corpus searches given SweLL-specific annotation, combined with guidelines for SVALA usage, a tool for correction annotation; and (2) in NLP research on problems such as grammatical error correction through guidelines on how to use the different available file formats that the SweLL-gold corpus is released in. Both cases are further supported by case studies and, where available, relevant scripts ready for reuse by researchers.
  • Laen...
    Pisipilt
    listelement.badge.dso-type Kirje ,
    A machine learning pipeline for digitalising historical printed materials – from data collection to a searchable database
    (University of Tartu Library, 2025-11) Pablo, Dalia Ortiz; Badri, Sushruth; Aangenendt, Gijs; von Bychelberg, Mo ; Lindström, Matts; Bouma, Gerlof; Dannélls, Dana; Kokkinakis, Dimitrios; Volodina, Elena
    Recent developments in the fields of machine learning and computer vision have created new opportunities for the digitalisation of printed historical materials. However, successful integration of machine learning models requires interdisciplinary collaboration between computer- and data scientists, researchers, librarians and/or archivists, and digitisation experts. This chapter describes a comprehensive pipeline designed to address the challenges of digitalising printed historical materials, from document-scanning best practices to incorporating state-of-the-art machine learning techniques. It aims to streamline the management and processing of historical data, making the digitalised materials accessible and searchable through the application of machine learning techniques. The content of this chapter encompasses scanning best practices, annotation approaches, model training, and deployment. This chapter presents a collection of useful tools for each stage of building a machine learning model, step-by-step instructions and example notebooks designed to be easily adapted to other cases.
  • Laen...
    Pisipilt
    listelement.badge.dso-type Kirje ,
    Applied NLP for humanities research
    (University of Tartu Library, 2025-11) Aangenendt, Gijs; Skeppstedt, Maria; Berglund, Karl; Bouma, Gerlof; Dannélls, Dana; Kokkinakis, Dimitrios; Volodina, Elena
    Natural language processing (NLP) has become a field of interest for many researchers within the humanities. However, framing humanities research questions as NLP problems and identifying suitable methods can be a difficult task. Taking previous and ongoing projects from the Centre for Digital Humanities and Social Sciences at Uppsala University (CDHU) as a point of departure, this chapter presents concrete use cases of how humanities research questions can be approached using various NLP methods and tools, from ready-to use text analysis tools to programming libraries that require basic familiarity with Python. Two case studies from the field of history and literature will be introduced to illuminate how texts can be processed for humanities research purposes. With this chapter, we hope to give the reader the means to directly explore NLP methods for their research as well as encourage further learning.
  • Laen...
    Pisipilt
    listelement.badge.dso-type Kirje ,
    Doing digital research at KBLab: A practical introduction to using the National Library of Sweden’s data lab
    (University of Tartu Library, 2025-11) Haffenden, Chris; Sikora, Justyna; Bouma, Gerlof; Dannélls, Dana; Kokkinakis, Dimitrios; Volodina, Elena
    The emergence of digital heritage data and the rapid development of new AI tools for computational analy sis are transforming GLAM institutions, particularly in the design of digital research infrastructure. Re searchers in the digital humanities and social sciences increasingly expect to access collections at unprece dented scales. This chapter addresses such expecta tions by providing a hands-on guide to KBLab, the data lab at the National Library of Sweden (KB). It outlines the lab’s resources, including access to KB’s digitized collections and AI models like KB-BERT, and showcases innovative development projects like Bild sök, which makes visual archives more accessible. The chapter also details the steps to initiate research col laborations and discusses best practices for utilizing KBLab’s tools effectively. By bridging technical in sights with practical applications, it serves as a com prehensive starting point for conducting large-scale digital research at KB and beyond.
  • Laen...
    Pisipilt
    listelement.badge.dso-type Kirje ,
    Interdisciplinary digital project design
    (University of Tartu Library, 2025-11) Brodén, Daniel; Fridlund, Mats; Lindhé, Cecilia; Westin, Jonathan; Bouma, Gerlof; Dannélls, Dana; Kokkinakis, Dimitrios; Volodina, Elena
    While discussions in digital humanities increasingly emphasise the importance of reflecting on collaborative workflows for interdisciplinary research, attention to specific practical expertise remains lacking. This paper introduces the concept of interdisciplinary digital project design to highlight a professional practice that integrates collaboration between traditional Humanities and Social Science (HSS) researchers and technical experts in developing research projects, digital resources and more. We begin by addressing the need for protocols to support workflow-oriented approaches to interdisciplinary collaboration, while underscoring the role of embodied expertise in facilitating teamwork. Furthermore, we argue that judgement – a critical yet often overlooked element – is an integral aspect of the professionalism involved. The discussion is grounded in descriptions of our contribution to five digital HSS projects, each offering a different perspective on the integrative professionalism involved. The paper concludes by discussing ways to further advance the conceptual understanding of interdisciplinary digital project design, with particular attention to the expertise that underpins this practice.
  • Laen...
    Pisipilt
    listelement.badge.dso-type Kirje ,
    Huminfra – a Swedish national infrastructure to support research in digital and experimental humanities
    (University of Tartu Library, 2025-11) Gullberg, Marianne; Cocq, Coppélie; Fridlund, Mats; Golub, Koraljka; Bouma, Gerlof; Dannélls, Dana; Kokkinakis, Dimitrios; Volodina, Elena
  • Laen...
    Pisipilt
    listelement.badge.dso-type Kirje ,
    Low-code web scraping and text analysis with Octoparse and KNIME: An example from the CICuW project
    (University of Tartu Library, 2025-11) Ihrmark, Daniel; Carlsson, Hanna; Hanell, Fredrik; Bouma, Gerlof; Dannélls, Dana; Kokkinakis, Dimitrios; Volodina, Elena
    Low-code tools play an important role in making data analysis and visualization accessible to researchers and students with limited experience, or interest, in programming. While low-code tools do introduce closedbox issues, they can still be considered important stepping stones toward computational approaches. This chapter draws on two such tools, Octoparse and KNIME (Konstanz Information Miner), to present a workflow from data collection from online sources, through text pre-processing, toward text classification in the context of the ongoing project Cultural Institutions and the Culture War (CICuW) that investigates the democratic implications of the pervasiveness of farright digital discourse. This chapter will introduce web scraping, topic modeling, and sentiment analysis in an accessible way, while also showcasing state-of-the-art approaches to the analysis components through the use of BERT (Bidirectional Encoder Representations from Transformers) models and zero-shot classification. The chapter will take a critical perspective on the described methods by discussing how they contribute to creating methodological closed-boxes and how quantitative techniques can be fruitfully combined with qualitative approaches
  • Laen...
    Pisipilt
    listelement.badge.dso-type Kirje ,
    Navigating Swedish Salafism Large language model-augmented content detection and topic modeling using BERTopic with YouTube metadata
    (University of Tartu Library, 2025-11) Svensson, Jonas; Bouma, Gerlof; Dannélls, Dana; Kokkinakis, Dimitrios; Volodina, Elena
    The chapter suggests and provides an example of a Large Language Model (LLM)-augmented method for gaining a quick overview of large sets of YouTube videos using metadata collected through the YouTube API. The case chosen is the Swedish Salafist YouTube channel islam.nu that houses 1 680 videos. An LLM (GPT-4o mini) is given a prompt to guess the content of videos based on information given in their titles and descriptions. These guesses are then used in an LLM-augmented topic modeling process utilizing the Python library BERTopic and the HUMINFRA resource, the Swedish Royal Library’s sentencetransformers model “sentence-bert-swedish-cased”. The videos thus placed under topics are then again subjected to processing by an LLM, to produce easyto-read representations of the topics. This method provides a convenient way to quickly understand the content of YouTube video sets and can serve as a first step in a purposive sampling procedure.
  • Laen...
    Pisipilt
    listelement.badge.dso-type Kirje ,
    Unlocking Swedish historical material through OCR and HTR
    (University of Tartu Library, 2025-11) Dannélls, Dana; Kurtz, Robin; Lenas, Erik; Löfgren, Viktoria; Bouma, Gerlof; Dannélls, Dana; Kokkinakis, Dimitrios; Volodina, Elena
    This chapter presents some of the efforts made by three national institutions and the challenges each institution encountered while advancing Optical Character Recognition (OCR) and Handwritten Text Recognition (HTR) technologies for Swedish historical material. It introduces the resources, models, and tools that can be used to refine computational approaches for improving OCR and HTR processing, which, in turn, could enhance text- and data-driven research.
  • Laen...
    Pisipilt
    listelement.badge.dso-type Kirje ,
    Tugg: A transcription tool for language documentation
    (University of Tartu Library, 2025-11) Ahltorp, Magnus; Berthelsen, Harald; Bouma, Gerlof; Dannélls, Dana; Kokkinakis, Dimitrios; Volodina, Elena
    Modern language documentation is less about field linguists treating the speakers of a language as rather passive informants, and more about actively involving them in the work as language consultants. This includes handling tools like recording equipment and being able to transcribe language data on their own. Catering to non-expert users of transcription software presents new challenges for a category of tools that traditionally was the exclusive domain of expert users. We will describe a new transcription tool that aims to be directly usable without any special knowledge of the tool, and only requiring modest amounts of previous computer knowledge for the end user. The tool is geared towards producing fully Leipzig glossing rules compliant transcriptions with a minimum of effort. This chapter will describe the tool, how to set up the server part of the tool, and practical instructions on how to use it, including suggested workflows. The chapter ends with a brief description of a practical use case and future directions.
  • Laen...
    Pisipilt
    listelement.badge.dso-type Kirje ,
    From text to insight: Uncovering linguistic patterns with SWEGRAM
    (University of Tartu Library, 2025-11) Megyesi, Beáta; Ruan, Rex; Bouma, Gerlof; Dannélls, Dana; Kokkinakis, Dimitrios; Volodina, Elena
    Empirical linguistic analysis provides valuable insights into textual data for researchers in the humanities and social sciences, enabling them to identify patterns and trends within large datasets. SWEGRAM is a freely available tool designed to annotate and analyze Swedish and English texts without requiring programming skills or a user account. Users can upload one or more texts for linguistic analysis, extracting morphological and syntactic features. The linguistically annotated texts can then be used for quantitative linguistic analysis, allowing researchers to systematically explore textual characteristics. Additionally, the tool visualizes syntactic relations between words in sentences and provides detailed insights into the distribution of syntactic functions and relations within the text. Users can also create their own linguistically annotated text collections and generate statistical summaries of the linguistic properties of their texts. The tool is available as both a web-based service, which requires no user login or account, and a downloadable version for local use when data privacy and security are a priority. This dual availability ensures accessibility and flexibility for diverse research needs.
  • Laen...
    Pisipilt
    listelement.badge.dso-type Kirje ,
    Exploratory Swedish text analysis using notebooks – a smörgårdsbord of basic corpus linguistic insights
    (University of Tartu Library, 2025-11) Kokkinakis, Dimitrios; Bouma, Gerlof; Bouma, Gerlof; Dannélls, Dana; Kokkinakis, Dimitrios; Volodina, Elena
    The computational notebook has established itself as a significant tool for conducting exploratory data analysis, which aims at investigating characteristics of a dataset without preformulated expectations. Computational notebooks are a type of interactive document, that supports mixing prose, executable code and its output, such as a calculated result, a table, or a graphic. Data, process, and narrative are effectively integrated into one environment, which makes notebooks ideal for documenting exploratory research. Notebooks also facilitate sharing research in a reproducible way for teaching, collaboration or dissemination. This chapter demonstrates basic exploratory techniques for Swedish text analysis implemented as Jupyter notebooks, a popular computational notebook implementation. Using a selection of documents from a Swedish corpus of COVID-19-related materials, we show some of the kinds of text analysis that can easily be performed using readily available software libraries. The examples in this chapter rely only on automatic annotation, requiring minimal manual processing.
  • Laen...
    Pisipilt
    listelement.badge.dso-type Kirje ,
    Exploring parallel corpora with STUnD: A Search Tool for Universal Dependencies
    (University of Tartu Library, 2025-11) Masciolini, Arianna; Lange, Herbert; Tóth, Márton András; Bouma, Gerlof; Dannélls, Dana; Kokkinakis, Dimitrios; Volodina, Elena
    We introduce STUnD (Search Tool for Universal Dependencies), a corpus search tool designed to facilitate working with parallel data. STUnD employs a query language that allows describing syntactic structures and specifying divergence patterns, which in turn make it possible to look for systematic differences between texts. Furthermore, the tool can automatically detect the differences between two similar documents. To achieve all this, STUnD leverages Universal Dependencies (UD), a cross-lingually consistent standard for morphosyntactic annotation. Input can consist of preannotated UD treebanks or raw text, which the tool automatically processes through a third-party parser. As demonstrated in the case study included in the present chapter, STUnD is especially well-suited for comparing syntactic structures across languages, with applications in the context of typology and translation studies. Other use cases include retrieving grammatical errors from parallel learner corpora and comparing different analyses of the same text.
  • Laen...
    Pisipilt
    listelement.badge.dso-type Kirje ,
    Empirisk ordforskning som grund för vidareutveckling av Svensk ordbok utgiven av Svenska Akademien
    (University of Tartu Library, 2025-11) Sköldberg, Emma; Blensenius, Kristian; Bouma, Gerlof; Dannélls, Dana; Kokkinakis, Dimitrios; Volodina, Elena
    Empirisk ordforskning är central för det lexikografiska arbetet med att uppdatera Svensk ordbok utgiven av Svenska Akademien (SO). Detta kapitel beskriver hur olika korpusar och verktyg, främst från Språkbanken Text och Kungliga biblioteket, används för att välja ut, analysera och revidera uppslagsord. Fokus ligger på böjningsuppgifter och betydelsebeskrivningar. Vi diskuterar metodologiska och praktiska utmaningar, inklusive digitaliseringens påverkan på lexikografiskt arbete. Genom moderna språkteknologiska verktyg har processen blivit mer vetenskaplig och effektiv, samtidigt som behovet av ytterligare textmaterial och metodutveckling kvarstår.
  • Laen...
    Pisipilt
    listelement.badge.dso-type Kirje ,
    A practical guide to the Swedish L2 lexical profile
    (University of Tartu Library, 2025-11) Lindström Tiedemann, Therese; Alfter, David; Volodina, Elena; Bouma, Gerlof; Dannélls, Dana; Kokkinakis, Dimitrios; Volodina, Elena
    Vocabulary is a fundamental aspect of any language since without words you cannot communicate, nor learn other aspects of a language, such as grammar or pronunciation. The Swedish L2 profile offers many ways in which researchers can explore the vocabulary which learners can produce and are expected to understand at different proficiency levels. It also provides a foundation for innovative ways of teaching Swedish, for instance, through Computer Assisted Language Learning (CALL) and Data Driven Learning (DDL). In this chapter we show how the lexical part of SweL2P can be used to explore the vocabulary growth of language learners both receptively and productively in a step-by-step overview. Starting from a bird’s eye view of vocabulary in course books and learner essays we show how to zoom in on some specific aspects of vocabulary, choosing adjectives as an example. We use SweL2P to show how adjectives occur in course books and how they appear in learners’ texts – comparing the lexis in both, but also showing the potential to explore the way learners acquire vocabulary more broadly. Finally, we present how results in SweL2P can be easily compared to other Swedish corpora.
  • Laen...
    Pisipilt
    listelement.badge.dso-type Kirje ,
    Preface
    (University of Tartu Library, 2025-11) Volodina, Elena; Bouma, Gerlof; Dannélls, Dana; Kokkinakis, Dimitrios
  • «
  • 1 (current)
  • 2
  • »

DSpace tarkvara autoriõigus © 2002-2025 LYRASIS

  • Teavituste seaded
  • Saada tagasisidet