Internetiotsingu toetamine otsingulogide jagamise meetodil
Files
Date
2012
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Tartu Ülikool
Abstract
Antud väitekiri on osa jätkuvast kollektiivsest uurimistööst, laiema eesmärgiga eeskätt parandada Internetiotsingu tuge keeruliste ja aeganõudvate ning tihti uurimusliku loomuga otsinguülesannete kiiremaks ja efektiivsemaks läbiviimiseks. Töö peamine uurimisprobleem on uut tüüpi otsinguülesannete logimise ja Internetis jagamise raamistiku väljatöötamine, olles alternatiiviks brauseri pistikprogrammide põhistele olemasolevatele meetoditele. Tegu oli keerulise insenertehnilise ülesandega, mille käigus tuli autoril täita mitmesuguseid programmeerimise, planeerimise, süsteemi komponentide integreerimise ja konfigureerimisega seotud ülesandeid. Püstitatud eesmärk sai edukalt täidetud.
Väitekirjas pakuti välja proksipõhine meetod kasutajate otsingukäitumise logimiseks, mis on ühtlasi lihtsasti kohaldatav erinevatele veebilehitsejatele ning operatsioonisüsteemidele. Lahendust võrreldi varasemate sarnaste süsteemidega. Meetod sündis reaalsest vajadusest leida kergemalt hallatav ning porditav asendus varem väljatöötatud tarkvarale, mis kujutas endast pistikprogrammi Mozilla Firefox veebilehitsejale, kuid mida tuli parandada pärast iga uue brauseri versiooni väljatulekut.
Teostus koosneb kahest suuremast komponendist, millest esimene ja tehniliselt keerulisem, otsinguülesannete logide koostamise ja jagamise süsteem, paikneb VirtualBox'i virtuaalses masinas. Teine on WordPress'il põhinev otsingulogide repositoorium, võimaldades lisaks kasutaja poolt annoteeritud logide avaldamise ka neist lihtsamaid otsinguid teostada. Süsteeme on põhjalikult testitud, kuid neid pole veel rakendatud Internetiotsinguga seotud kasutajauurimustesse. Autorile on teada, et selline huvi on olemas nii Tartu Ülikooli sees kui ka ühe välismaise partnerülikooli poolt.
Lokaalselt paiknev otsinguülesannete koostamise ja jagamise süsteem koosneb kolmest võrdselt tähtsast alamkomponendist. Nendeks on Python'i keeles realiseeritud otsinguülesande logija; peamiselt PHP'd ja HTML'i kasutav veebiliides, mis muuhulgas võimaldab kasutajal eelpoolmainitud logijat sisse ja välja lülitada, aga ka kõiki otsinguülesandega seotud andmeid käsitsi muuta ja täiendada; ja antud ülesandeks spetsiaalselt konfigureeritud Privoxy veebiproksi server.
Töös antakse põhjalik ülevaade olemasolevast tarkvarast, teaduspublikatsioonidest ja teoreetilistest alustest seoses väitekirja uurimisprobleemiga. Võrreldes olemasolevate meetoditega eristub autori pakutud proksipõhine otsinguülesannete logimise ja jagamise raamistik peamiselt kahel põhjusel. Esiteks, meetod tagab platvormist ja brauserist sõltumatuse, olles ühtlasi väga stabiilne. Teiseks, kasutajatele antav vabadus oma otsinguülesannet vabalt defineerida ning annoteerida on oluliseks uueks tähiseks.
Väitekirja viimases peatükis käsitletakse tööga seotud tulevikuväljavaateid ja avatud probleeme. Üks neist on väljapakutavaga võrreldes muudetud arhitektuur, mis võimaldaks korraldada väiksema vaeva ja ajakuluga laborieksperimente. Internetiotsingu logimise süsteemi saab edasi arendada, lisades tuge enamatele JavaScript'i sündmustele. Otsingulogide repositoorium, olles veel üsna algeline, pakub hulgaliselt võimalusi täiendusteks tulevikuks.
The main research problem of my thesis was engineering a new type of search task logging and publishing framework which would provide a better alternative for existing browser plug-in based methods. Right from the start, the proxy-based search task reporting system has been a complex engineering challenge involving code written in multiple programming languages, interactions planned across many software modules (some of which have already been existing large projects themselves), and a Linux operating system configured to ease the set-up process for the user. This was the decision process to make sure that this solution is reliable, extendible and maintainable in the future. My research goal was completed successfully. In my thesis, I proposed a proxy-based method for logging user search behaviour across different browsers and operating systems. I also compared it with an existing plug-in based Search Logger for Mozilla Firefox and other similar solutions. The idea of developing a proxy-based search task logging and publishing solution came from out of necessity, because the existing logging solution had significant problems with maintainability. The logs created by my solution are subsequently annotated by the user and made publicly available on a dedicated Internet blog called the Search Task Repository. Users can search against the already annotated and published Internet search logs. Ideally this would mean reduced complexity of search tasks for the users which in turn saves time. User studies to confirm this are still pending but there is confirmed interest from Tartu researchers as well as from one foreign university to use my solution in their search experiments. The proposed solution is comprised of two large units, which are the search task repository and the search task logging and publishing unit. The search task repository is a remote component, essentially a fairly simple WordPress blog, which enables search stories to be published automatically over XML-RPC protocol, search queries to be served, and search task logs to be displayed to the searcher. My logging system is configured as a VirtualBox virtual machine. It is much more complex, consisting of three sub-components: the main Web interface, the search task logger, and the Privoxy Web proxy specially configured for my needs. Logging can be started and stopped at a user's will in the main Web interface. What is more, this sub-component also gives them absolute control over what gets published online by providing an editing and annotating functionality for all search task data, both implicitly and explicitly logged. A comprehensive theoretical overview was given in my thesis about the state of the art, explaining basic related concepts in Information Retrieval and recent developments in Exploratory Search and search task logging systems. In contrast with existing browser plug-in based search task logging methods, my proposed proxy-based approach ensures platform and browser independence while also being very stable. By giving searcher's the opportunity to freely define and annotate their own search tasks, my search support solution is setting a new standard. In the final chapter, I conducted a thorough analysis about future work and presented my own vision about the future opportunities for this search support methodology. A modified architecture for more convenient laboratory experiments was outlined as an important task for the future. In conclusion, my proxy-based search task logging, editing and publishing framework can be extended further to log more JavaScript events. The search task repository is a large open area with lots of opportunities for future extensions.
The main research problem of my thesis was engineering a new type of search task logging and publishing framework which would provide a better alternative for existing browser plug-in based methods. Right from the start, the proxy-based search task reporting system has been a complex engineering challenge involving code written in multiple programming languages, interactions planned across many software modules (some of which have already been existing large projects themselves), and a Linux operating system configured to ease the set-up process for the user. This was the decision process to make sure that this solution is reliable, extendible and maintainable in the future. My research goal was completed successfully. In my thesis, I proposed a proxy-based method for logging user search behaviour across different browsers and operating systems. I also compared it with an existing plug-in based Search Logger for Mozilla Firefox and other similar solutions. The idea of developing a proxy-based search task logging and publishing solution came from out of necessity, because the existing logging solution had significant problems with maintainability. The logs created by my solution are subsequently annotated by the user and made publicly available on a dedicated Internet blog called the Search Task Repository. Users can search against the already annotated and published Internet search logs. Ideally this would mean reduced complexity of search tasks for the users which in turn saves time. User studies to confirm this are still pending but there is confirmed interest from Tartu researchers as well as from one foreign university to use my solution in their search experiments. The proposed solution is comprised of two large units, which are the search task repository and the search task logging and publishing unit. The search task repository is a remote component, essentially a fairly simple WordPress blog, which enables search stories to be published automatically over XML-RPC protocol, search queries to be served, and search task logs to be displayed to the searcher. My logging system is configured as a VirtualBox virtual machine. It is much more complex, consisting of three sub-components: the main Web interface, the search task logger, and the Privoxy Web proxy specially configured for my needs. Logging can be started and stopped at a user's will in the main Web interface. What is more, this sub-component also gives them absolute control over what gets published online by providing an editing and annotating functionality for all search task data, both implicitly and explicitly logged. A comprehensive theoretical overview was given in my thesis about the state of the art, explaining basic related concepts in Information Retrieval and recent developments in Exploratory Search and search task logging systems. In contrast with existing browser plug-in based search task logging methods, my proposed proxy-based approach ensures platform and browser independence while also being very stable. By giving searcher's the opportunity to freely define and annotate their own search tasks, my search support solution is setting a new standard. In the final chapter, I conducted a thorough analysis about future work and presented my own vision about the future opportunities for this search support methodology. A modified architecture for more convenient laboratory experiments was outlined as an important task for the future. In conclusion, my proxy-based search task logging, editing and publishing framework can be extended further to log more JavaScript events. The search task repository is a large open area with lots of opportunities for future extensions.