Design and Implementation of an Incremental ELT Pipeline for a Jira Data Warehouse using Data Vault 2.0 Methodology and HP Vertica

dc.contributor.advisorAwaysheh, Feras M., juhendaja
dc.contributor.authorBobkov, Rasmus
dc.contributor.otherTartu Ülikool. Loodus- ja täppisteaduste valdkondet
dc.contributor.otherTartu Ülikool. Arvutiteaduse instituutet
dc.date.accessioned2023-10-24T11:22:04Z
dc.date.available2023-10-24T11:22:04Z
dc.date.issued2023
dc.description.abstractThis master’s thesis outlines the design and implementation of a containerized ELT pipeline for TEHIK, a company requiring an efficient way to analyze Jira Software data. The pipeline is designed to incrementally load data into a Vertica DWH, constructed following DV 2.0 principles. The containerized architecture enables easy deployment in production environments. Considering the extensive breadth of the subject, the thesis aims to provide an overarching understanding of DE, DV 2.0, Agile methodologies, and implementation. Instead of delving into intricate specifics of each area, it focuses on presenting a broad perspective, offering a more comprehensive view of these fields. The thesis begins by examining the current system, underlining its limitations, and then introduces the proposed solution, emphasizing its advantages. The Background Knowledge and Related Work section endeavors to provide a solid understanding of the central concepts in DE, DWH’ing, and the DV 2.0 methodology, along with deployment in production environments. This section touches upon key topics such as ingestion, ELT vs ETL architecture, DWH architectures, and the essence and benefits of the DV 2.0 methodology. While the practical application of Kubernetes, logging, monitoring, and orchestration with Airflow is not included in the thesis due to time restrictions, these aspects are still crucial for a holistic understanding of the project. Hence, a conceptual overview of orchestration using Airflow and a theoretical implementation for logging and monitoring are provided. The implementation section comprehensively explores the project’s process, unveiling the specific steps and methodologies employed, the challenges faced, and their respective solutions. The subsequent ’Results and Analysis’ section critically compares the proposed solution and the existing one. It evaluates aspects like reporting capabilities, compliance with SLAs, and an analysis of the pipeline’s performance, considering its ability to handle large data volumes and scalability. In conclusion, this thesis delivers a robust, scalable, and efficient solution comprising an ELT pipeline and a DV 2.0-based DWH tailored for TEHIK’s Jira Software data analysis needs. This integrated solution outperforms the existing system, providing a solid foundation for future enhancements and expansions.et
dc.identifier.urihttps://hdl.handle.net/10062/93708
dc.language.isoenget
dc.publisherTartu Ülikoolet
dc.rightsopenAccesset
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.subjectELT pipelineet
dc.subjectData Engineeringet
dc.subjectDWHet
dc.subjectHP Verticaet
dc.subjectJiraet
dc.subjectTEHIKet
dc.subjectDV 2.0et
dc.subjectMeltanoet
dc.subjectshell scriptinget
dc.subjectvsqlet
dc.subjectsofware developmentet
dc.subjectDockeret
dc.subjectscalabilityet
dc.subject.othermagistritöödet
dc.subject.otherinformaatikaet
dc.subject.otherinfotehnoloogiaet
dc.subject.otherinformaticset
dc.subject.otherinfotechnologyet
dc.titleDesign and Implementation of an Incremental ELT Pipeline for a Jira Data Warehouse using Data Vault 2.0 Methodology and HP Verticaet
dc.typeThesiset

Failid

Originaal pakett

Nüüd näidatakse 1 - 1 1
Laen...
Pisipilt
Nimi:
Bobkov_DataScience_2023.pdf
Suurus:
7.14 MB
Formaat:
Adobe Portable Document Format
Kirjeldus:

Litsentsi pakett

Nüüd näidatakse 1 - 1 1
Laen...
Pisipilt
Nimi:
license.txt
Suurus:
1.71 KB
Formaat:
Item-specific license agreed upon to submission
Kirjeldus: