Reinforcement learning

Pages

Introduction and Markov decision proceses

Reinforcement learning deals with sequential decision making problems where an agent: Observes the state of the environment Acts on the environment performing some action The environment evolves in a…

Value function and Q-function

The total reward is typically defined s the expected discounted cumulative reward . We can define the value function $V^\pi (s)$ of policy $\pi$ in state $s$: $$V^\pi (s) =…

We apply a policy to explore the environment in order to collect information and we keep a progressively updated estimation of the optimal Q-function applying a sample-based version of the Bellman…