Reinforcement learning
Pages
Introduction and Markov decision proceses
Reinforcement learning deals with sequential decision making problems where an agent: Observes the state of the environment Acts on the environment performing some action The environment evolves in a…
Value function and Q-function
The total reward is typically defined s the expected discounted cumulative reward . We can define the value function $V^\pi (s)$ of policy $\pi$ in state $s$: $$V^\pi (s) =…
Q-learning
We apply a policy to explore the environment in order to collect information and we keep a progressively updated estimation of the optimal Q-function applying a sample-based version of the Bellman…