Towards Auto-Scaling of Serverless Data Pipelines

dc.contributor.advisorPoojara, Shivananda R., juhendaja
dc.contributor.authorDas, Rajan Raj
dc.contributor.otherTartu Ülikool. Loodus- ja täppisteaduste valdkondet
dc.contributor.otherTartu Ülikool. Arvutiteaduse instituutet
dc.date.accessioned2023-10-26T10:34:46Z
dc.date.available2023-10-26T10:34:46Z
dc.date.issued2023
dc.description.abstractThe ever-increasing number of IoT devices generates massive data, and collecting data from heterogeneous sources and processing it without any bottleneck is challenging. Data pipelines are heavily used for automated data processing without any manual hassle. The traditional Data pipelines, such as Extract-Load-Transform, has its own challenges, which are difficult to scale and reduce the timeliness of data processing. It can be solved with the use of serverless computing. Serverless computing is a recent paradigm in cloud computing, It offers granular level scaling of the functions compared to the Virtual Machine (VM). With the increase of smart and Internet of Things(IoT) devices, the use of data pipeline is increased exponentially. However, stochastic IoT workloads and assuring Quality of Service metrics (Latency, throughput, etc.) impose several challenges, including scaling of the underlying infrastructure. Serverless Data Pipelines(SDP) can be designed to process high data volume with efficient resource usage. SDPs comprise several components like serverless functions, message queues, and queue connectors. Scaling the entire pipeline without leaving any bottlenecks is challenging. In our study, we created a serverless data pipeline for an Image Processing IoT application that uses serverless functions to execute the data operation tasks. We also applied different reactive scaling mechanisms, such as resource-based scaling and Workload based scaling, to measure the performance of the scalability on the serverless data pipeline. The reactive mechanisms consider single metrics to enforce auto-scaling configuration, i.e. CPU usage or Request rate. Therefore, we evaluated the use of multiple performance metrics of the Serverless data Pipeline to proactively predict the number of serverless functions in the data pipeline. To experiment with this, we collected data by configuring the reactive auto-scalers, cleaning them to remove outliers, and using them for training and testing the proactive auto-scaler. In this work, we used multioutput regression models, and the results show that the ExtraTreeRegressor algorithm has better efficiency in predicting the pods.et
dc.identifier.urihttps://hdl.handle.net/10062/93778
dc.language.isoenget
dc.publisherTartu Ülikoolet
dc.rightsopenAccesset
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.subjectCloud Computinget
dc.subjectServerless Functionset
dc.subjectFunction as a Service (FaaS)et
dc.subjectData Pipelineset
dc.subject.othermagistritöödet
dc.subject.otherinformaatikaet
dc.subject.otherinfotehnoloogiaet
dc.subject.otherinformaticset
dc.subject.otherinfotechnologyet
dc.titleTowards Auto-Scaling of Serverless Data Pipelineset
dc.typeThesiset

Failid

Originaal pakett

Nüüd näidatakse 1 - 1 1
Laen...
Pisipilt
Nimi:
Das_MSc_software_engineering_2023.pdf
Suurus:
1.93 MB
Formaat:
Adobe Portable Document Format
Kirjeldus:

Litsentsi pakett

Nüüd näidatakse 1 - 1 1
Laen...
Pisipilt
Nimi:
license.txt
Suurus:
1.71 KB
Formaat:
Item-specific license agreed upon to submission
Kirjeldus: