- Math/Machine Learning Terminology
- Rule Post Pruning Summary
- Find-S Summary
- Candidate Elimination Summary
- Decision Trees Summary
- Stochastic Gradient Descent Summary
- Delta Training Rule Summary
- Perceptron Training Rule Summary
- A Summary Of Multilayer Neural Networks
- Abdullah's Machine Learning Notes
A single neuron basically represents a boolean valued function that takes its inputs, does a linear combination of them (using some weights) and then feeds the result of the linear combination to an activation function, which spits out either
false. To make our math simpler, we usually represent
-1. We can visualize a single neuron like this:
A single neuron can only represent linear functions because it simply does a linear combination of its inputs, which it then feeds to an activation function. Simply feeding the output of a linear combination to an activation function still keeps the entire composited function linear! Thus if we choose a single neuron as our model, we will only be able to find linear functions that best fit our training examples. We would not consider any non linear functions at all.
This is where multilayer neural networks come in (hint: they are able to represent non linear functions).
Multilayer Neural Network
If we create a bunch of single neurons, and then feed the output of these neurons to another single neuron, we have what we call a multilayer neural network. We can visualize it like so:
Notice that we created 3 single neurons, and then connected the output of these single neurons to yet another single neuron. Together we call this whole thing a multilayer neural network.
Also, quick side note, you may have noticed I removed the “…” from the picture. The “…” just meant that your data can have as many dimensions as you want, but to be concrete, we will only consider 3 dimensions (x1, x2, and x3) here. So you are seeing what a multilayer neural network for 3 dimensional data would look like.
Collectively we refer to the leftmost neurons as the input layer, the middle neurons as the hidden layer, and the right most neurons as the output layer.
The way I’ve been drawing multilayer networks is wasteful of drawing space so people usually draw it a different way. They draw the input neurons just once, not once for every neuron in the hidden layer. Lemme show you:
That’s a much more compact drawing! But now when you see these compact drawings, you’ll know how to interpret them.
So you know that each neuron in the hidden layer does a linear combination of their inputs, then feeds that to an activation function and spit out either
-1. What does the neuron in the output layer do? Exactly the same thing. It does a linear combination of its inputs, feeds that to an activation function, and spits out either
-1. What you gain by structuring single neurons in this manner, is the ability to represent non linear functions.
A single neuron can only represent linear functions. When you need to find a non linear function, use a multilayer neural network.
So we know that this kind of structure (a “multilayer neural network”), is able to represent non linear functions. All we have to do is find weights for this structure that “best fits” our training examples. We do this via an algorithm called backpropagation.
- initialize weights arbitrarily (small values around 0.1, don’t get too excited there buddy)
- continuously iterate through training examples
- feed the training example to the neural network
- look at what the neural network outputted (i.e. the output of the neuron in the output layer)
- what should it have outputted for this training example?
- based on the above 2 facts, calculate an error value for the neuron in the output layer
- for each neuron in the hidden layer
- calculate an error value for the neuron (based on the weight coming out of this neuron, and the error of the neuron that the weight points to)
- update all weights in the network
- the update a particular weight gets is based on
- a learning rate (often called )
- the error for the node where the weight points to
- the output that comes from the neuron where the weight originates from
- the update a particular weight gets is based on
You will continue to do this until you perform acceptably on your test data.
- calculate an error for the output layer neuron
- calculate an error for the hidden layer neurons
- update all weights
Some Things To Note
If you want your neural network to output a continuous number (and not just a +1 or -1), just don’t use a thresholding function at your output layer. In other words, your output layer neuron should have a linear output.
If you want your neural network to be able to do non-binary classification (e.g. what color is it?), then in your output layer, you need 1 neuron for each possible classification. Additionally, your activation function has to be the sigmoid function (or a similar function) and not a thresholding function.
The multilayer neural network I chose above has only 1 hidden layer. A neural network with only 1 hidden layer can represent any boolean function. It can also represent any continuous function. The number of neurons needed in the hidden layer depends on the specific boolean/continuous function you are seeking.
A neural network with 2 hidden layers can represent any function at all. But sadly, the number of neurons needed in each hidden layer isn’t really known right now? You just have to experiment? I don’t know the answer.