Low-code web scraping and text analysis with Octoparse and KNIME: An example from the CICuW project

dc.contributor.authorIhrmark, Daniel
dc.contributor.authorCarlsson, Hanna
dc.contributor.authorHanell, Fredrik
dc.contributor.editorBouma, Gerlof
dc.contributor.editorDannélls, Dana
dc.contributor.editorKokkinakis, Dimitrios
dc.contributor.editorVolodina, Elena
dc.date.accessioned2025-11-10T12:37:24Z
dc.date.available2025-11-10T12:37:24Z
dc.date.issued2025-11
dc.description.abstractLow-code tools play an important role in making data analysis and visualization accessible to researchers and students with limited experience, or interest, in programming. While low-code tools do introduce closedbox issues, they can still be considered important stepping stones toward computational approaches. This chapter draws on two such tools, Octoparse and KNIME (Konstanz Information Miner), to present a workflow from data collection from online sources, through text pre-processing, toward text classification in the context of the ongoing project Cultural Institutions and the Culture War (CICuW) that investigates the democratic implications of the pervasiveness of farright digital discourse. This chapter will introduce web scraping, topic modeling, and sentiment analysis in an accessible way, while also showcasing state-of-the-art approaches to the analysis components through the use of BERT (Bidirectional Encoder Representations from Transformers) models and zero-shot classification. The chapter will take a critical perspective on the described methods by discussing how they contribute to creating methodological closed-boxes and how quantitative techniques can be fruitfully combined with qualitative approaches
dc.identifier.isbn9789908536125
dc.identifier.urihttps://hdl.handle.net/10062/117354
dc.identifier.urihttps://doi.org/10.58009/aere-perennius0184
dc.language.isoen
dc.publisherUniversity of Tartu Library
dc.relation.ispartofHuminfra handbook: Empowering digital and experimental humanities
dc.rightsAttribution 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.titleLow-code web scraping and text analysis with Octoparse and KNIME: An example from the CICuW project
dc.typeArticle

Failid

Originaal pakett

Nüüd näidatakse 1 - 1 1
Laen...
Pisipilt
Nimi:
Huminfra_Handbook_Chapter15.pdf
Suurus:
2.01 MB
Formaat:
Adobe Portable Document Format