# Markov Process

**reinforcement learning**:

- Markov Process
- Markov Reward Process
- Bellman Expectation Equation
- Bellman Optimality Equation
- RL Overview
- General Policy Iteration (GPI)
- Monte Carlo Method Of Finding Values
- TD Method Of Finding Values
- Markov Decision Process

You can think of a **markov process** as a directed graph. Each node represents a “state”, and the edges represent transitions between states.

Each edge, has a probability, which tells you how likely you are to take this route (i.e. transition from the state at the start of the edge to the state at the end of the edge).

```
p = 0.5
----------------------|
| |
v |
+---------+ +---------+ +---------+
| | p = 0.3 | | p = 0.5 | |
| state 1 | ------> | state 2 | ------> | state 4 |
| | | | | |
+---------+ +---------+ +---------+
| +---------+ ^
| p = 0.7 | | p = 0.8 |
|-------------> | state 3 | -------------+
| | ----+
+---------+ |
^ |
| |
+-----------
p = 0.2
```

A **sample** is when you start in one state, and then record what happens afterwards (i.e. what “path”) you take to the ending state (if there is an ending state).

Samples are often called **episodes**.

You can use basic statistics to calculate the probability of any path occuring (the product of all the edges in the path - remember to `and`

events, multiply them, to `or`

them, add).