Motivating Reinforcement Learning Agents to Control their Environment

Mohamed, Youssef Sherif Mansour

Motivating Reinforcement Learning Agents to Control their Environment

Failid

Mohamed_Msc_CS.pdf (3.86 MB)

Kuupäev

2021

Autorid

Mohamed, Youssef Sherif Mansour

Kirjastaja

Tartu Ülikool

Abstrakt

Exploration lies at the heart of every Reinforcement Learning problem. Sparse environments rarely reward agents, making them extremely hard to explore. Behavioral biases attempt to solve the problem by intrinsically motivating the agent to exhibit certain behaviors. Understanding the controllable aspects of an environment is a popular behavioral bias implemented using intrinsic motivators. It helped many models to achieve state-of-the-art results. However, current methods rely on inverse dynamics learning to identify controllable aspects. Inverse dynamics learning has drawbacks limiting the agent’s ability to model the controllable objects. We highlight some of these drawbacks and propose an alternate approach to learning controllable aspects of the environment. This thesis introduces Controlled Effects Network (CEN), a self-supervised method for learning controllable aspects in a Reinforcement Learning environment. CEN uses causal concepts of blame to identify controllable objects. We integrate CEN in an intrinsic motivation module which improves the exploration behavior of reinforcement learning agents. Agents using CEN outperform inverse dynamics agents in both efficiency learning and the max score achieved in Sparse Environments. CEN-based motivator encourages the agent to do more interactions with controllable objects in an environment. Hence, the agent is more likely to reach events that trigger an extrinsic reward from the environment. We compare agents using CEN-based intrinsic motivators and others using Inverse dynamics-based motivators. To this end, we create multiple sparse environments to test the exploration behavior of both agents. In an empty grid, CEN agents exhibit uniform exploration visiting numerous grid cells, while Inverse agents tend to stick to corners and walls. In sparse Clusters, CEN agents achieve a max score of 5 while Inverse agents manage to get only 1. Moreover, CEN agents learn to solve the Clusters environment more efficiently, requiring fewer environment steps. We open source our implementation of CEN, the sparse environments, and the Never Give Up (NGU) reinforcement learning agent to ease future research on controllability and exploration.

Märksõnad

Reinforcement Learning, Causality, Exploration, Deep Neural Networks

URI

https://hdl.handle.net/10062/91954

Kollektsioonid

LTAT magistritööd – Master's theses

Kirje täielik lehekülg

Motivating Reinforcement Learning Agents to Control their Environment

Failid

Kuupäev

Autorid

Ajakirja pealkiri

Ajakirja ISSN

Köite pealkiri

Kirjastaja

Abstrakt

Kirjeldus

Märksõnad

Viide

URI

Kollektsioonid