Alternatiiv keskväärtuse maksimeerimisele

Litvin, Andre

Alternatiiv keskväärtuse maksimeerimisele

Failid

litvin_informaatika_2023.pdf (509.98 KB)

Kuupäev

2023

Autorid

Litvin, Andre

Kirjastaja

Tartu Ülikool

Abstrakt

Reinforcement learning problems can generally be described as follows. The user quantifies how good each state of some system would be according to their preferences and some agent, e.g. a robot, must choose actions that lead to states the user defined as good. More formally, for each state and action, the user picks a real-valued reward and the goal of reinforcement learning is to automatically find a strategy, called a policy, which would lead to a high reward sum. However, actions often do not determine states, but only make some states likelier than others. In this case, the policy is usually chosen by maximizing the expected reward. However, in this thesis, I prove that for every probability p < 1 and constant c > 0, there exists a reinforcement learning problem where the policy maximizing expected reward gives reward sum Z , but another policy would give reward sum Z, where P[Z > Z + c] > p. In other words, the policy maximizing expected reward can get an arbitrarily smaller reward sum with arbitrarily high probability (except 1) compared to another policy. This might not be a desirable property for a policy to have. In this thesis, I define the smoothened median of a random variable and prove that any policy that maximizes the smoothened median of the reward sum (instead of the expectation) does not have this property.

Märksõnad

Reinforcement learning, median, heavy-tailed distributions, Kelly betting system

URI

https://hdl.handle.net/10062/93602

Kollektsioonid

LTAT bakalaureusetööd – Bachelor's theses

Kirje täielik lehekülg

Alternatiiv keskväärtuse maksimeerimisele

Failid

Kuupäev

Autorid

Ajakirja pealkiri

Ajakirja ISSN

Köite pealkiri

Kirjastaja

Abstrakt

Kirjeldus

Märksõnad

Viide

URI

Kollektsioonid