Alternatiiv keskväärtuse maksimeerimisele

dc.contributor.advisorVicente, Raul, juhendaja
dc.contributor.authorLitvin, Andre
dc.contributor.otherTartu Ülikool. Loodus- ja täppisteaduste valdkondet
dc.contributor.otherTartu Ülikool. Arvutiteaduse instituutet
dc.date.accessioned2023-10-19T09:01:07Z
dc.date.available2023-10-19T09:01:07Z
dc.date.issued2023
dc.description.abstractReinforcement learning problems can generally be described as follows. The user quantifies how good each state of some system would be according to their preferences and some agent, e.g. a robot, must choose actions that lead to states the user defined as good. More formally, for each state and action, the user picks a real-valued reward and the goal of reinforcement learning is to automatically find a strategy, called a policy, which would lead to a high reward sum. However, actions often do not determine states, but only make some states likelier than others. In this case, the policy is usually chosen by maximizing the expected reward. However, in this thesis, I prove that for every probability p < 1 and constant c > 0, there exists a reinforcement learning problem where the policy maximizing expected reward gives reward sum Z , but another policy would give reward sum Z, where P[Z > Z + c] > p. In other words, the policy maximizing expected reward can get an arbitrarily smaller reward sum with arbitrarily high probability (except 1) compared to another policy. This might not be a desirable property for a policy to have. In this thesis, I define the smoothened median of a random variable and prove that any policy that maximizes the smoothened median of the reward sum (instead of the expectation) does not have this property.et
dc.identifier.urihttps://hdl.handle.net/10062/93602
dc.language.isoestet
dc.publisherTartu Ülikoolet
dc.rightsopenAccesset
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.subjectReinforcement learninget
dc.subjectmedianet
dc.subjectheavy-tailed distributionset
dc.subjectKelly betting systemet
dc.subject.otherbakalaureusetöödet
dc.subject.otherinformaatikaet
dc.subject.otherinfotehnoloogiaet
dc.subject.otherinformaticset
dc.subject.otherinfotechnologyet
dc.titleAlternatiiv keskväärtuse maksimeerimiseleet
dc.typeThesiset

Failid

Originaal pakett

Nüüd näidatakse 1 - 1 1
Laen...
Pisipilt
Nimi:
litvin_informaatika_2023.pdf
Suurus:
509.98 KB
Formaat:
Adobe Portable Document Format
Kirjeldus:

Litsentsi pakett

Nüüd näidatakse 1 - 1 1
Laen...
Pisipilt
Nimi:
license.txt
Suurus:
1.71 KB
Formaat:
Item-specific license agreed upon to submission
Kirjeldus: