Alternatiiv keskväärtuse maksimeerimisele
Laen...
Kuupäev
Autorid
Ajakirja pealkiri
Ajakirja ISSN
Köite pealkiri
Kirjastaja
Tartu Ülikool
Abstrakt
Reinforcement learning problems can generally be described as follows. The
user quantifies how good each state of some system would be according to their
preferences and some agent, e.g. a robot, must choose actions that lead to states
the user defined as good. More formally, for each state and action, the user picks
a real-valued reward and the goal of reinforcement learning is to automatically
find a strategy, called a policy, which would lead to a high reward sum.
However, actions often do not determine states, but only make some states
likelier than others. In this case, the policy is usually chosen by maximizing
the expected reward. However, in this thesis, I prove that for every probability
p < 1 and constant c > 0, there exists a reinforcement learning problem where
the policy maximizing expected reward gives reward sum Z , but another policy
would give reward sum Z, where P[Z > Z + c] > p. In other words, the policy
maximizing expected reward can get an arbitrarily smaller reward sum with
arbitrarily high probability (except 1) compared to another policy.
This might not be a desirable property for a policy to have. In this thesis, I define
the smoothened median of a random variable and prove that any policy that
maximizes the smoothened median of the reward sum (instead of the expectation)
does not have this property.
Kirjeldus
Märksõnad
Reinforcement learning, median, heavy-tailed distributions, Kelly betting system