Towards Auto-Scaling of Serverless Data Pipelines

Das, Rajan Raj

Towards Auto-Scaling of Serverless Data Pipelines

Files

Das_MSc_software_engineering_2023.pdf (1.93 MB)

Date

2023

Authors

Das, Rajan Raj

Publisher

Tartu Ülikool

Abstract

The ever-increasing number of IoT devices generates massive data, and collecting data from heterogeneous sources and processing it without any bottleneck is challenging. Data pipelines are heavily used for automated data processing without any manual hassle. The traditional Data pipelines, such as Extract-Load-Transform, has its own challenges, which are difficult to scale and reduce the timeliness of data processing. It can be solved with the use of serverless computing. Serverless computing is a recent paradigm in cloud computing, It offers granular level scaling of the functions compared to the Virtual Machine (VM). With the increase of smart and Internet of Things(IoT) devices, the use of data pipeline is increased exponentially. However, stochastic IoT workloads and assuring Quality of Service metrics (Latency, throughput, etc.) impose several challenges, including scaling of the underlying infrastructure. Serverless Data Pipelines(SDP) can be designed to process high data volume with efficient resource usage. SDPs comprise several components like serverless functions, message queues, and queue connectors. Scaling the entire pipeline without leaving any bottlenecks is challenging. In our study, we created a serverless data pipeline for an Image Processing IoT application that uses serverless functions to execute the data operation tasks. We also applied different reactive scaling mechanisms, such as resource-based scaling and Workload based scaling, to measure the performance of the scalability on the serverless data pipeline. The reactive mechanisms consider single metrics to enforce auto-scaling configuration, i.e. CPU usage or Request rate. Therefore, we evaluated the use of multiple performance metrics of the Serverless data Pipeline to proactively predict the number of serverless functions in the data pipeline. To experiment with this, we collected data by configuring the reactive auto-scalers, cleaning them to remove outliers, and using them for training and testing the proactive auto-scaler. In this work, we used multioutput regression models, and the results show that the ExtraTreeRegressor algorithm has better efficiency in predicting the pods.

Keywords

Cloud Computing, Serverless Functions, Function as a Service (FaaS), Data Pipelines

URI

https://hdl.handle.net/10062/93778

Collections

MTAT magistritööd – Master's theses

Full item page

Towards Auto-Scaling of Serverless Data Pipelines

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections