# A Summary Of Multilayer Neural Networks

**machine learning**:

- Math/Machine Learning Terminology
- Rule Post Pruning Summary
- Find-S Summary
- Candidate Elimination Summary
- Decision Trees Summary
- Stochastic Gradient Descent Summary
- Delta Training Rule Summary
- Perceptron Training Rule Summary
- A Summary Of Multilayer Neural Networks
- Abdullah's Machine Learning Notes

# Single Neuron

A single neuron basically represents a boolean valued function that takes its inputs, does a linear combination of them (using some weights) and then feeds the result of the linear combination to an activation function, which spits out either `true`

or `false`

. To make our math simpler, we usually represent `true`

as `+1`

and `false`

as `-1`

. We can visualize a single neuron like this:

A single neuron can only represent linear functions because it simply does a linear combination of its inputs, which it then feeds to an activation function. Simply feeding the output of a linear combination to an activation function still keeps the entire composited function linear! Thus if we choose a single neuron as our model, we will only be able to find *linear* functions that best fit our training examples. We would not consider any non linear functions at all.

This is where multilayer neural networks come in (hint: they are able to represent non linear functions).

# Multilayer Neural Network

If we create a bunch of single neurons, and then feed the output of these neurons to *another* single neuron, we have what we call a **multilayer neural network**. We can visualize it like so:

Notice that we created 3 single neurons, and then connected the output of these single neurons to yet *another* single neuron. Together we call this whole thing a multilayer neural network.

Also, quick side note, you may have noticed I removed the “…” from the picture. The “…” just meant that your data can have as many dimensions as you want, but to be concrete, we will only consider 3 dimensions (x1, x2, and x3) here. So you are seeing what a multilayer neural network for 3 dimensional data would look like.

Collectively we refer to the leftmost neurons as the **input layer**, the middle neurons as the **hidden layer**, and the right most neurons as the **output layer**.

The way I’ve been drawing multilayer networks is wasteful of drawing space so people usually draw it a different way. They draw the input neurons just once, not once for every neuron in the hidden layer. Lemme show you:

That’s a much more compact drawing! But now when you see these compact drawings, you’ll know how to interpret them.

So you know that each neuron in the hidden layer does a linear combination of their inputs, then feeds that to an activation function and spit out either `+1`

or `-1`

. What does the neuron in the output layer do? Exactly the same thing. It does a linear combination of *its* inputs, feeds that to an activation function, and spits out either `+1`

or `-1`

. What you gain by structuring single neurons in this manner, is the ability to represent non linear functions.

A single neuron can only represent linear functions. When you need to find a non linear function, use a multilayer neural network.

So we know that this kind of structure (a “multilayer neural network”), is able to represent non linear functions. All we have to do is find weights for this structure that “best fits” our training examples. We do this via an algorithm called **backpropagation**.

# Backpropogation

- initialize weights arbitrarily (small values around 0.1, don’t get too excited there buddy)
- continuously iterate through training examples
- feed the training example to the neural network
- look at what the neural network outputted (i.e. the output of the neuron in the output layer)
- what should it have outputted for this training example?
- based on the above 2 facts, calculate an error value for the neuron in the output layer
- for each neuron in the hidden layer
- calculate an error value for the neuron (based on the weight coming out of this neuron, and the error of the neuron that the weight points to)

- update all weights in the network
- the update a particular weight gets is based on
- a learning rate (often called )
- the error for the node where the weight points to
- the output that comes from the neuron where the weight originates from

- the update a particular weight gets is based on

You will continue to do this until you perform acceptably on your test data.

Notice you

- calculate an error for the output layer neuron
- calculate an error for the hidden layer neurons
- update all weights

# Some Things To Note

If you want your neural network to output a continuous number (and not just a +1 or -1), just don’t use a thresholding function at your output layer. In other words, your output layer neuron should have a linear output.

If you want your neural network to be able to do non-binary classification (e.g. what color is it?), then in your output layer, you need 1 neuron for each possible classification. Additionally, your activation function has to be the sigmoid function (or a similar function) and not a thresholding function.

The multilayer neural network I chose above has only 1 hidden layer. A neural network with only 1 hidden layer can represent any **boolean** function. It can also represent any **continuous** function. The number of neurons needed in the hidden layer depends on the specific boolean/continuous function you are seeking.

A neural network with 2 hidden layers can represent any function at all. But sadly, the number of neurons needed in each hidden layer isn’t really known right now? You just have to experiment? I don’t know the answer.