Reinforcement learning introduction
Reinforcement learning deals with sequential decision making problems where an agent:
- Observes the state of the environment
- Acts on the environment performing some action
- The environment evolves in a new state and provides the agent a reward, a feedback that the agent will use to decide if a specific action is good in a given state
The goal of the agent is: learning a policy (a prescription of actions, what action is best in every state) which maximizes the total reward.
Agent behaviour (policies)
The agent behaviour is modelled with the policy. A policy $\pi$ is a function taking a state and an action and giving a probability, $\pi \to [0, 1]$, such that $\pi(a|s)$ is the probability to perform action $a$ in state $s$.