Robotics and Computer Engineering - Master's theses

Permanent URI for this collectionhttps://hdl.handle.net/10062/42116

Browse

Now showing 1 - 7 of 7

Dynamic Rate Allocation of Interactive Multi-View Video with View-Switch Prediction
(Tartu Ülikool, 2016) Sarkar, Suman; Anbarjafari, Gholamreza; Ozcinar, Cagri; Tartu Ülikool. Loodus- ja täppisteaduste valdkond; Tartu Ülikool. Tehnoloogiainstituut
In Interactive Multi-View Video (IMVV), the video has been captured by numbers of cameras positioned in array and transmitted those camera views to users. The user can interact with the transmitted video content by choosing viewpoints (views from different cameras in the array) with the expectation of minimum transmission delay while changing among various views. View switching delay is one of the primary concern that is dealt in this thesis work, where the contribution is to minimize the transmission delay of new view switch frame through a novel process of selection of the predicted view and compression considering the transmission efficiency. Mainly considered a realtime IMVV streaming, and the view switch is mapped as discrete Markov chain, where the transition probability is derived using Zipf distribution, which provides information regarding view switch prediction. To eliminate Round-Trip Time (RTT) transmission delay, Quantization Parameters (QP) are adaptively allocated to the remaining redundant transmitted frames to maintain view switching time minimum, trading off with the quality of the video till RTT time-span. The experimental results of the proposed method show superior performance on PSNR and view switching delay for better viewing quality over the existing methods.
English-Estonian Machine Translation: Evaluation Across Different Models and Architectures
(Tartu Ülikool, 2020) Islam, Md Rezwanul; Anbarjafari, Gholamreza; Sait Arslan, Hasan; Tartu Ülikool. Loodus- ja täppisteaduste valdkond; Tartu Ülikool. Tehnoloogiainstituut
This thesis is based on three main objectives: at first, the implementation of RNMT+ archi-tecture with Relational-RNN model. This is an interaction between this architecture and the RNN model. Secondly, train three different translation models based on RNMT+, Trans-former, and sequence to sequence architectures. Previously, we have witnessed the perfor-mance comparison among RNMT+ with LSTM, Transformer, seq2seq, etc. Finally, evalu-ate the translation model based on training data. When implementing RNMT+, the core idea was to use a newer type of Recurrent Neural Network (RNN) instead of a widely used LSTM or GRU. Besides this, we evaluate the RNMT+ model with other models based on state-of-the-art Transformer and Sequence to Sequence with attention architectures. This evaluation (BLEU) shows that neural machine translation is domain-dependent, and translation based on the Transformer model performs better than the other two in OpenSubtitle v2018 domain while RNMT+ model performs better compared to other two in a cross-domain evaluation. Additionally, we compare all the above-mentioned architectures based on their correspond-ing encoder-decoder layers, attention mechanism and other available neural machine translation and statistical machine translation architectures. In estonian: See lõputöö põhineb kolmel põhieesmärgil: alguses RNMT + arhitektuuri rakendamine Relatsioon-RNN-mudeli abil. See on interaktsioon selle arhitektuuri ja RNN-mudeli vahel. Teiseks, koolitage kolme erinevat tõlkemudelit, mis põhinevad RNMT +, Trafo ja järjestusearhitektuuridel. Varem oleme olnud tunnistajaks RNMT + jõudluse võrdlusele LSTM, Transformeri, seq2seq jne abil. Lõpuks hinnake tõlkemudelit koolitusandmete põhjal. RNMT + rakendamisel oli peamine idee kasutada laialdaselt kasutatava LSTM või GRU asemel uuemat tüüpi korduvat närvivõrku (RNN). Lisaks hindame RNMT + mudelit koos teiste mudelitega, mis põhinevad tipptehnoloogial Transformer ja Sequence to Sequence koos tähelepanu arhitektuuridega. See hinnang (BLEU) näitab, et neuraalne masintõlge on domeenist sõltuv ja muunduril Transformer põhinev tõlge toimib paremini kui ülejäänud kaks OpenSubtitle v2018 domeenis, samal ajal kui RNMT + mudel toimib paremini kui ülejäänud kaks domeenidevahelist hindamist. Lisaks võrdleme kõiki ülalnimetatud arhitektuure nende vastavate kodeerija-dekoodri kihtide, tähelepanu mehhanismi ja muude saadaolevate närvi masintõlke ning statistiliste masintõlke arhitektuuride põhjal.
Facial Expression Recognition using Neural Network for Dyadic Interaction
(Tartu Ülikool, 2020) Sham, Abdallah Hussein; Ozcinar, Cagri; Tikka, Pia; Anbarjafari, Gholamreza; Tartu Ülikool. Loodus- ja täppisteaduste valdkond; Tartu Ülikool. Tehnoloogiainstituut
Computers are machines that don’t share emotions as humans do. With the help of Machine Learning (ML) and Artificial Intelligence (AI), social robots can become a reality. These robots are currently capable of interacting with people at a certain level, but not exactly as a person would do. For them to reach that level, they would need to understand more about how people interact daily and to learn from the dyadic interaction of two people would be a good option. Participants’ facial expressions are the main features that can be retrieved from dyadic interaction and this can be done using a trained Deep Neural Network (DNN) model. The DNN model, known as the Mini-Xception, is trained in this thesis using a dataset that has been pre-processed and can then be tested on images. Using a face detector algorithm, the model will be able to detect a person’s facial expression on the image. After successful image results, the model can be tested using a different medium. First, the tests are carried out using a webcam, then videos with more than one participant. Since people react to expressions, their reactions can also be caused by a context in which, for example, sad news would be the reason for sad emotion. The results of the tests will, therefore, be used for analysis where a correlation can be constructed between facial expressions and context.
Human Activity Recognition Based Path Planning For Autonomous Vehicles
(Tartu Ülikool, 2020) Tammvee, Martin; Anbarjafari, Gholamreza; Tartu Ülikool. Loodus- ja täppisteaduste valdkond; Tartu Ülikool. Tehnoloogiainstituut
Human activity recognition (HAR) is wide research topic in a field of computer science. Improving HAR can lead to massive breakthrough in humanoid robotics, robots used in medicine and in the field of autonomous vehicles. The system that is able to recognise human and its activity without any errors and anomalies, would lead to safer and more empathetic autonomous systems. During this thesis multiple neural networks models, with different complexity, are being investigated. Each model is re-trained on the proposed unique data set, gathered on automated guided vehicle (AGV) with the latest and the modest sensors used commonly on autonomous vehicles. The best model is picked out based on the final accuracy for action recognition. Best models pipeline is fused with YOLOv3, to enhance the human detection. In addition to pipeline improvement, multiple action direction estimation methods are proposed. The action estimation of the human is very important aspect for self-driving car collision free path planning.
Neural Networks Based Automatic Content Moderation on Social Media
(Tartu Ülikool, 2020) Karabulut, Dogus; Ozcinar, Cagri; Anbarjafari, Gholamreza; Tartu Ülikool. Loodus- ja täppisteaduste valdkond; Tartu Ülikool. Tehnoloogiainstituut
Millions of users produce and consume billions of content on social media. Therefore, humanreviewed content moderation is not achievable in such volume. Automating content moderation is a scalable solution for social media platforms. In this thesis work, we propose a neural networks based automatic content moderation pipeline. Our solution consists of two main parts: the first part that classifies the content into granular content classes and a second part that automatically obfuscates the part of the image that might be inappropriate for the target audience. The proposed solution is cost-efficient in terms of human labour. Our classification network is trained with automatically labelled data using noise-robust techniques. Our automatic obfuscation algorithm uses the information obtained from the classification network and does not require additional annotation or supplementary training. This obfuscation algorithm presents a novel-use case to the state-of-the-art.
Real-Time Expression Analysis of Students in a Classroom Using Facial Emotion Recognition
(Tartu Ülikool, 2020) Lominadze, Andro; Anbarjafari, Gholamreza; Tartu Ülikool. Loodus- ja täppisteaduste valdkond; Tartu Ülikool. Tehnoloogiainstituut
Life is getting more relied on computers. People create new machines and programs to make their lives easier. Devices are involved in a daily routine, so it might be useful if they were capable of understanding human’s verbal or even emotional expressions. Nowadays, computers can learn almost anything and can help to analyze the surrounding world sometimes better than the human sense. The following study can be used while doing a presentation or giving a speech in front of a big audience. It allows the user to be aware of the emotional condition of attending society. During the speech, it is almost impossible to observe every single face of the audience and guess how do they feel; computer vision techniques can do this job for humans. This framework consists of three main parts. In the first part, a pre-trained face detector model collects all the faces seen in the camera and assigns unique IDs. Each face is tracked during the whole video stream using and developing a Simple object tracking algorithm called Centroid Tracker. This tracker relies on a Euclidean distance measurement between the location of object centroid within the current and previous frame of video. The second part of this thesis is Facial Expression Recognition (FER). For this part, Convolutional Neural Network (CNN) is trained over the FER2013 data set. The model is fed a set of face images taken from the previous step, successfully classifies seven different emotional states. The third part stores the data of emotions for each person in such a way that it could be easily understandable for humans. The provided information contains the number of attending people, their facial expressions and overall mood in the audience. By this information, the user gets feedback about his/her speech. This feedback might help people improve presentation skills for the future or even change the presenting style immediately to increase the interest in the audience. In Estonian: Meie aja elu sõltub arvutitest üha enam. Inimesed loovad oma elu lihtsustamiseks uusi masinaid ja programme. Seadmed osalevad meie igapäevases rutiinis, nii et kuigi me oleme nende masinate loojad, vajame neid, et mõista meie suulisi või isegi emotsionaalseid väljendeid. Tänapäeval saavad arvutid õppida peaaegu kõike ja aitavad meil ümbritsevat maailma mõnikord isegi paremini analüüsida, kui teeme seda inimlike meelte järgi. Järgnevat uuringut saab kasutada ettekande tegemisel või suure publiku ees kõne pidamisel. See võimaldab kasutajal olla teadlik ühiskonnas käimise emotsionaalsest olukorrast. Kõne ajal on peaaegu võimatu jälgida iga nägu publikus ja arvata, kuidas nad end tunnevad; arvuti nägemise tehnikad teevad selle töö meie eest. See raamistik koosneb kolmest põhiosast. Esimeses osas kogub eelkoolituse saanud näotuvastuse mudel kõik kaameras nähtud näod ja määrab ainulaadsed ID-d. Iga nägu jälgitakse kogu videovoo jooksul, kasutades ja arendades lihtsat objektide jälgimise algoritmi nimega Centroid Tracker. See jälgija tugineb Eukleidese vahekauguse mõõtmisele objekti keskpunkti asukoha vahel video praeguses ja eelmises kaadris.Lõputöö teine osa on näoilmetuvastus (FER). Selle jaoks koolitatakse FER2013 andmekogu kaudu konvolutsioonilist närvivõrku (CNN). Mudelile sisestatakse eelmisest etapist võetud näopiltide komplekt, see klassifitseerib edukalt seitse erinevat emotsionaalset olekut. Kolmas osa salvestab emotsioonide andmed iga inimese kohta viisil, mis oleks inimestele kergesti arusaadav. Esitatud teave sisaldab osalevate inimeste arvu, nende näoilmeid ja üldist meeleolu publikus. Selle teabe abil saab kasutaja tagasisidet oma kõne kohta. See tagasiside võib aidata edaspidiseks esinemisoskust parendada või isegi esitusstiili kohe muuta, et suurendada publiku huvi.
Super Resolution and Face Recognition Based People Activity Monitoring Enhancement Using Surveillance Camera
(Tartu Ülikool, 2016) Uiboupin, Tõnis; Anbarjafari, Gholamreza; Rasti, Pejman; Tartu Ülikool. Loodus- ja täppisteaduste valdkond; Tartu Ülikool. Tehnoloogiainstituut
Due to importance of security in the society, monitoring activities and recognizing specific people through surveillance video camera is playing an important role. One of the main issues in such activity rises from the fact that cameras do not meet the resolution requirement for many face recognition algorithms. In order to solve this issue, in this work we are proposing a new system which super resolve the image. First, we are using sparse representation with the specific dictionary involving many natural and facial images to super resolve images. As a second method, we are using deep learning convulutional network. Image super resolution is followed by Hidden Markov Model and Singular Value Decomposition based face recognition. The proposed system has been tested on many well-known face databases such as FERET, HeadPose, and Essex University databases as well as our recently introduced iCV Face Recognition database (iCV-F). The experimental results shows that the recognition rate is increasing considerably after applying the super resolution by using facial and natural image dictionary. In addition, we are also proposing a system for analysing people movement on surveillance video. People including faces are detected by using Histogram of Oriented Gradient features and Viola-jones algorithm. Multi-target tracking system with discrete-continuouos energy minimization tracking system is then used to track people. The tracking data is then in turn used to get information about visited and passed locations and face recognition results for tracked people.

Browse

Browsing Robotics and Computer Engineering - Master's theses by Author "Anbarjafari, Gholamreza"