Skip to main content

Reinforcement learning introduction

Reinforcement learning deals with sequential decision making problems where an agent:

  1. Observes the state of the environment
  2. Acts on the environment performing some action
  3. The environment evolves in a new state and provides the agent a reward, a feedback that the agent will use to decide if a specific action is good in a given state

The goal of the agent is: learning a policy (a prescription of actions, what action is best in every state) which maximizes the total reward.

Agent behaviour (policies)

The agent behaviour is modelled with the policy. A policy $\pi$ is a function taking a state and an action and giving a probability, $\pi \to [0, 1]$, such that $\pi(a|s)$ is the probability to perform action $a$ in state $s$.