Markov Process
- > Markov Process
- Markov Reward Process
- Bellman Expectation Equation
- Bellman Optimality Equation
- RL Overview
- General Policy Iteration (GPI)
- Monte Carlo Method Of Finding Values
- TD Method Of Finding Values
- Markov Decision Process
You can think of a markov process as a directed graph. Each node represents a “state”, and the edges represent transitions between states.
Each edge, has a probability, which tells you how likely you are to take this route (i.e. transition from the state at the start of the edge to the state at the end of the edge).
p = 0.5
----------------------|
| |
v |
+---------+ +---------+ +---------+
| | p = 0.3 | | p = 0.5 | |
| state 1 | ------> | state 2 | ------> | state 4 |
| | | | | |
+---------+ +---------+ +---------+
| +---------+ ^
| p = 0.7 | | p = 0.8 |
|-------------> | state 3 | -------------+
| | ----+
+---------+ |
^ |
| |
+-----------
p = 0.2
A sample is when you start in one state, and then record what happens afterwards (i.e. what “path”) you take to the ending state (if there is an ending state).
Samples are often called episodes.
You can use basic statistics to calculate the probability of any path occuring (the product of all the edges in the path - remember to and
events, multiply them, to or
them, add).