Motivating Reinforcement Learning Agents to Control their Environment
Laen...
Kuupäev
Autorid
Ajakirja pealkiri
Ajakirja ISSN
Köite pealkiri
Kirjastaja
Tartu Ülikool
Abstrakt
Exploration lies at the heart of every Reinforcement Learning problem. Sparse
environments rarely reward agents, making them extremely hard to explore. Behavioral
biases attempt to solve the problem by intrinsically motivating the agent to exhibit
certain behaviors. Understanding the controllable aspects of an environment is a popular
behavioral bias implemented using intrinsic motivators. It helped many models to achieve
state-of-the-art results. However, current methods rely on inverse dynamics learning
to identify controllable aspects. Inverse dynamics learning has drawbacks limiting the
agent’s ability to model the controllable objects. We highlight some of these drawbacks
and propose an alternate approach to learning controllable aspects of the environment.
This thesis introduces Controlled Effects Network (CEN), a self-supervised method
for learning controllable aspects in a Reinforcement Learning environment. CEN uses
causal concepts of blame to identify controllable objects. We integrate CEN in an
intrinsic motivation module which improves the exploration behavior of reinforcement
learning agents. Agents using CEN outperform inverse dynamics agents in both efficiency
learning and the max score achieved in Sparse Environments. CEN-based motivator
encourages the agent to do more interactions with controllable objects in an environment.
Hence, the agent is more likely to reach events that trigger an extrinsic reward from the
environment.
We compare agents using CEN-based intrinsic motivators and others using Inverse
dynamics-based motivators. To this end, we create multiple sparse environments to test
the exploration behavior of both agents. In an empty grid, CEN agents exhibit uniform
exploration visiting numerous grid cells, while Inverse agents tend to stick to corners
and walls. In sparse Clusters, CEN agents achieve a max score of 5 while Inverse agents
manage to get only 1. Moreover, CEN agents learn to solve the Clusters environment
more efficiently, requiring fewer environment steps. We open source our implementation
of CEN, the sparse environments, and the Never Give Up (NGU) reinforcement learning
agent to ease future research on controllability and exploration.
Kirjeldus
Märksõnad
Reinforcement Learning, Causality, Exploration, Deep Neural Networks