A tic tac toe playing AI simply needs to know what cell of the tic tac toe board to occupy next. In other words, given a particular board state, it needs to know what cell to occupy next.

Thus we need to find a function that maps a tic tac toe board state to a next move (what cell of the tic tac toe board it should occupy next):

$getNextMove(boardState) \rightarrow nextMove$

Every time it is the AI’s turn to make a move, it will pass the current board state into this function, and the function will tell it what move to take next.

Now, let’s dive a little deeper, how will we implement this function? The function will first find all the possible moves that can be taken from the passed in board state, and then suggest the best move. In other words, the function will first find all the possible cells it could occupy, and then choose the best one.

Now, what cirteria should we use to judge the “best move”?

We could say that the best move is the one that usually eventually leads to a victory. This is a very good criteria because it takes into account that sometimes a bad intermediate choice may actually be better for the long run, but it forces us to deal with credit assignment. Credit assignment is figuering out how much credit or blame to assign to each intermediate move for the final outcome, and it is a very difficult problem to tackle (difficult, not impossible).

Let’s see if we can find an easier solution. If we can, good, if not, we will tackle the credit assignment problem.

What if we give each board state a value? The best board state would be the one with the highest value. Before making a move, the AI can look at each possible cell that it can occupy, calculate the value of the resulting board state if it chose to occupy that cell, and then pick the move the results in the highest valued board state!

Now, we need to find a way to map a board state to a value:

$getValue(boardState) \rightarrow value$

What features of the board state do we want the value of the board state to be based of?

How many empty cells? How many enemy cells? How many friendly (your own) cells? This doesn’t seem like a good idea for a tic tac toe board, because the number of empty, friendly, or enemy cells aren’t important, their position in the board is!

What if we make the value of a board depend on what’s in cell1, cell2, cell3, and so on? This seems like a good idea, but now we have to fill in the details. What exactly do we mean by depend (quantitatively). What if we make it a linear function, 3 terms for each cell (9 cells, thus 9 * 3 = 27 terms). One of the three terms of a cell will be a 1 if there is an enemy there, 0 if there isn’t. The next term will be a 1 if there is a friendly there, 0 if there isn’t. I was initially thinking that the next term should be a 1 if there is an empty space there, 0 if there isn’t, but I believe the previous two terms takes care of this case (if both of them are 0’s, it means it was empty), thus we only need 2 terms per cell (9 cells, thus 9 * 2 = 18 terms).

Here is how we will refer to the cells of a tic tac toe board: TODO put image here

Our value linear function will be something like:

$value = w_1 * enemyInCell1 + w_2 * friendInCell1 + w_3 * enemyInCell2 + w_4 * friendInCell2 + ...$

Now we have a generic hypothesis (linear function wrt some features) and we need to find weights for the features. This is where we give training data, and then find specific hypothesis (function) that best fits our training data.

But before we do that, let’s think about our chosen generic hypothesis for a second. The value of a tic tac toe board does not lie in how many empty, friendly, or enemy cells. It lies in the neighbors. What cells surround your friendly cells? In other words, the features interact with one another, thus we can’t use a linear function to represent this.

A linear function represents a system where the inputs do not interact with one another.

One feature may positively effect the output, another may negatively effect the output, but they don’t interact. In other words, the value of one feature, does not change how much or in what direction the other feature effects the output.

After all that work, we find out that a linear function is not the proper way to model here. That’s ok, we learned and reinforced a lot by going through the reasoning.

This is More General Than You Think

Here we made a tic tac toe AI that could take a look at the board state and then make a smart move.

But we can use similar reasoning to create AI for a lot of other things. For example, we could create a chess playing AI. The chess playing AI would take a look at a chess board state, and then make a chess move. The chess board state and moves are more complicated but you still have the concepts of a “board state” and “next move”.

This is even more generic than board games. If you think of your AI living in some world, it can take a look at the world state, and then make a next move (i.e. take an action) in the world.

The reasoning (why you do what you do) and tools that you learned in this post are much, much more broadly applicable than just games with board states and moves.

The reasoning (not the conclusion or the end product) is the most important part of any concept.