Perceptron Training Rule is an algorithm for finding the weights of a single unit (“neuron”).

What is a Unit (Neuron)?

In machine learning, a “unit” (“single neuron”) is a thing that takes some inputs, applies a weight to each input, and then based on the linear combination of the inputs and an activation function, spits out either on or off. To make our math simple, we say that if it outputs +1, it is on, if it outputs -1, it is off.

Inputs  Weights  Activation Fcn           Output
======  =======  ==============           ======

          w1
  x1 +------->  +--------------+
                |              |
          w2    |  activation  |          +1 ("on")
  x2 +------->  |   function   |  +---->
                |              |          -1 ("off")
          w3    |              |
  x3 +------->  +--------------+

The inputs are often called features. Vector form of the features is $\vec{x}$ . Vector form of the weights is $\vec{w}$ .

The activation function is simply a threshold function. A threshold function outputs +1 if the input is greater than some threshold (often 0 is chosen), else it outputs -1.

$\textit{thresholdFunction}(value) = \left\{ \begin{array}{ll} +1 & \quad value \gt 0 \\ -1 & \quad value \leq 0 \end{array} \right.$

Perceptron Training Rule Algorithm

Look at that beauuuutiful, marvelous, ASCII diagram above. We just need to find the weights (i.e. $\vec{w}$ ) that linearly separates our training data (i.e. the weights that cause our unit to output +1 for our positive training examples, and -1 for our negative training examples).

Here is ze algooooreeeiithm:

arbitrarily initialize $\vec{w}$
keep iterating through our training examples
- if with our current weights, we classified this training example correctly, don’t change the weights in response to this example
- if we classified the training example as negative when it should have been positive, 1) increase the weights associated with the positive features and 2) decrease the weights associated with the negative features. In other words, make $\vec{w} \cdot \vec{x}$ produce a greater value, so that when it goes in our activation function, it is more likely to produce +1.
- if we classified the training example as positive when it should have been negative, then we need to make $\vec{w} \cdot \vec{x}$ produce a smaller value, thus we need to 1) decrease the weights associated with positive features and 2) increase the weights associated with negative features

Those last 3 bullets, all those words, can be represented by the following mathematical expression:

$\vec{w} = \vec{w} + n(t-o)\vec{x}$

$n$ is known as the “learning rate”, and you can just make this a small positive value (like 0.1). This will determine the size of the chunks by which you update $\vec{w}$ after considering each training example.

The $(t-o)\vec{x}$ portion of the expression is really where all of the words from the last 3 bullets went into. $t$ is the actual output for the training example, $o$ is what our unit outputs for the training example using our current weights, and $\vec{x}$ is the features vector.

Let’s consider how our first wordy bullet matches with the expression. If we classified the training example correctly, then $t = o$ , thus $t - o = 0$ , thus we don’t update $\vec{w}$ at all.

Let’s consider how our second wordy bullet matches with the expression. If we classified the training example as negative, but it should be positive, then $t - o$ is $+1 - -1 = 2$ . Thus positive 2 multiplied by $\vec{x}$ will increase the weights for all positive features, and decrease the weights for all negative features. This has the overall effect of increasing $\vec{w} \cdot \vec{x}$ , thus increasing the chance that the activation function will subsequently output +1 for this example (which is what it should output for this example).

I will leave the thinking for how the third wordy bullet matches with the expression for you.

Summary

Perceptron Training Rule is a machine learning algorithm for learning the weights of a single neuron (a single unit). It does so by constantly iterating through your training examples and updating the weights in response to each training example.

The update expression it uses is $\vec{w} = \vec{w} + n(t-o)\vec{x}$ .