Motivating Reinforcement Learning Agents to Control their Environment

dc.contributor.advisorCorcoll, Oriol, juhendaja
dc.contributor.advisorVicente, Raul, juhendaja
dc.contributor.authorMohamed, Youssef Sherif Mansour
dc.contributor.otherTartu Ülikool. Loodus- ja täppisteaduste valdkondet
dc.contributor.otherTartu Ülikool. Arvutiteaduse instituutet
dc.date.accessioned2023-09-01T09:59:16Z
dc.date.available2023-09-01T09:59:16Z
dc.date.issued2021
dc.description.abstractExploration lies at the heart of every Reinforcement Learning problem. Sparse environments rarely reward agents, making them extremely hard to explore. Behavioral biases attempt to solve the problem by intrinsically motivating the agent to exhibit certain behaviors. Understanding the controllable aspects of an environment is a popular behavioral bias implemented using intrinsic motivators. It helped many models to achieve state-of-the-art results. However, current methods rely on inverse dynamics learning to identify controllable aspects. Inverse dynamics learning has drawbacks limiting the agent’s ability to model the controllable objects. We highlight some of these drawbacks and propose an alternate approach to learning controllable aspects of the environment. This thesis introduces Controlled Effects Network (CEN), a self-supervised method for learning controllable aspects in a Reinforcement Learning environment. CEN uses causal concepts of blame to identify controllable objects. We integrate CEN in an intrinsic motivation module which improves the exploration behavior of reinforcement learning agents. Agents using CEN outperform inverse dynamics agents in both efficiency learning and the max score achieved in Sparse Environments. CEN-based motivator encourages the agent to do more interactions with controllable objects in an environment. Hence, the agent is more likely to reach events that trigger an extrinsic reward from the environment. We compare agents using CEN-based intrinsic motivators and others using Inverse dynamics-based motivators. To this end, we create multiple sparse environments to test the exploration behavior of both agents. In an empty grid, CEN agents exhibit uniform exploration visiting numerous grid cells, while Inverse agents tend to stick to corners and walls. In sparse Clusters, CEN agents achieve a max score of 5 while Inverse agents manage to get only 1. Moreover, CEN agents learn to solve the Clusters environment more efficiently, requiring fewer environment steps. We open source our implementation of CEN, the sparse environments, and the Never Give Up (NGU) reinforcement learning agent to ease future research on controllability and exploration.et
dc.identifier.urihttps://hdl.handle.net/10062/91954
dc.language.isoenget
dc.publisherTartu Ülikoolet
dc.rightsopenAccesset
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.subjectReinforcement Learninget
dc.subjectCausalityet
dc.subjectExplorationet
dc.subjectDeep Neural Networkset
dc.subject.othermagistritöödet
dc.subject.otherinformaatikaet
dc.subject.otherinfotehnoloogiaet
dc.subject.otherinformaticset
dc.subject.otherinfotechnologyet
dc.titleMotivating Reinforcement Learning Agents to Control their Environmentet
dc.typeThesiset

Failid

Originaal pakett

Nüüd näidatakse 1 - 1 1
Laen...
Pisipilt
Nimi:
Mohamed_Msc_CS.pdf
Suurus:
3.86 MB
Formaat:
Adobe Portable Document Format
Kirjeldus:

Litsentsi pakett

Nüüd näidatakse 1 - 1 1
Laen...
Pisipilt
Nimi:
license.txt
Suurus:
1.71 KB
Formaat:
Item-specific license agreed upon to submission
Kirjeldus: