Introduction and Markov decision proceses

Reinforcement learning deals with sequential decision making problems where an agent:

Observes the state of the environment
Acts on the environment performing some action
The environment evolves in a new state and provides the agent a reward, a feedback that the agent will use to decide if a specific action is good in a given state

The goal of the agent is: learning a policy (a prescription of actions, what action is best in every state) which maximizes the total reward.

Markow decision processes

S state space: the possible states
A action space: the possible actions
P transition model: which is the next state when a given action in given state. We allow a the transition model to be a stochastic function. $P(s^{'}|s,a) \to [0,1]$
R reward function: $R(s, a) \to \real$
$\gamma$ discount factor: how much I care about the future rewards compared to the immediate rewards. $\gamma \in [0, 1]$
$\mu_0$ initial state distribution: $\mu_0(S) \to [0,1]$ is the probability that the interactions starts in state S.

In this model we are in a single agent environment and the transition model satisfies the Markov property: the distribution of the next state depends only on the current state and the action, it is independent from the history of states actions observed so far.

P and R are unknown: in order to play in the environment the agent must interact and learn.

Agent behaviour (policies)

The agent behaviour is modelled with the policy. A policy $\pi$ is a function taking a state and an action and giving a probability, $\pi \to [0, 1]$, such that $\pi(a|s)$ is the probability to perform action $a$ in state $s$.

Search problems

Introduction

Breadth-first search

Uniform-cost search

Depth-first search

Depth-limited and iterative deepening search

Bidirectional search

Summary

Introduction

A* search

Introduction

Minimax search

alpha-beta pruning

Monte Carlo Tree Search

Stochastic games

Introduction and Markov decision proceses

Value function and Q-function

Q-learning

Introduction

Constraint Propagation: Inference in CSPs

Backtracking Search for CSPs

Introduction to uncertainty

Bayesian Networks

Uncertainty over time

Example problems on uncertainty

Introduction

Propositional Logic

First order logic

Model checking

Theorem proving

Classical Planning and PDDL

Algorithms for Classical Planning

SATPlan

Introduction and Markov decision proceses

Markow decision processes

Agent behaviour (policies)

No Comments