**Abstract:** This is a note of 'Neural Network Design 2ed'^{1}. This post is an introduction to the perceptron learning rule. Perceptron is a strong model that is still used in many neural networks.

## Learning Rules

We have built some neural network models in the post 'An Introduction to Neural Networks' and as I have mentioned architectures and learning rules are two main aspects in designing a network to solve a special problem. And our already owned architectures have not come into play. What we are going to do is to investigate both architecture and learning rules.

**The learning rule is a procedure to modify weights, bias and other parameters in the model to perform a certain task.** There are generally 3 types of learning rules:

- Supervised Learning

- There is a training set of the model who contains a number of input data as well as correct output value, like \(\{\boldsymbol{p}_1,t_1\},\{\boldsymbol{p}_2,t_2\},\cdots,\{\boldsymbol{p}_Q,t_Q\},\) where \(\boldsymbol{p}_i\) (for \(i<Q\)) is the input to the model and \(t_i\)(for \(i<Q\)) is the correct corresponding target(output or label)
- the whole process is: where target and current output are used to modify the neuron network to make it produce an output more close to the target.

- Unsupervised Learning

- Unlike supervised learning, unsupervised learning doesn't know the correct output at all, in other words, what the neuron network knows is only the inputs and how to modify the parameters in the model is all by the inputs. This kind of learning is always used to perform clustering operation or vector quantization.

- Reinforcement learning

- it is another learning rule which is more like supervised learning and works more like a biological brain.
- there is no correct target as well but there is a grade to measure the performance of the network over some sequence of input.

## Perceptron Architecture

Perceptron is an easy model of a neuron, and what we interested in how to design a learning rule or a certain algorithm to make a perceptron possible to classify input signals. Although perceptron is the easiest neuron model, it is a good basis for more complex networks, and it is still working nowadays.

In a perception, it has weights, biases, and transfer functions. These are basic components of a neuron model. However, the transfer function here is specialized as a hard limit function

\[ f(x) = \begin{cases} 0& \text{ if } x<0\\ 1& \text{ if } x\geq 0 \end{cases}\tag{1} \]

and abbreviated notation with a layer of \(S\) neuron network is:

where \(W\) are the weight matrix of \(S\) perceptron, like . And each row of the matrix contains all the weight of a perceptron.

Two types of a perceptron are introduced here, the first one has a single neuron and the second one has several. what we are going to investigate is the first one for it is more easy to discuss.

### Single-Neuron Perceptron

In some simple models, we can visualize a line or a plane named 'decision boundary' which is determined by the input vectors for which the transfer function's input \(n=0\). For instance, in a 2 dimensions linear classification task, we are asked to find a line that can separate the input points into two different classes. This line here acts as a decision boundary and all the points on the line would product \(0\) output through the linear model. However, when we can not see the decision boundary by eyes, it becomes difficult to find one in a higher dimension than 3. For most neuron networks they have many more dimensions than 2, so we should develop some kinds of abilities and tools to analysis and proof our algorithms in that condition.

Let's start to study with one perceptron.

It's decision is the line:

\[ n=w_{1,1}p_1+w_{1,2}p_2+b=0\tag{2} \]

For example, when \(w_{1,1}=1\), \(w_{1,2}=1\) and \(b=-1\), we get the decision boundary:

\[ p_1+p_2-1=0\tag{3} \]

from the figure, we can conclude the weight vector has: 1. \(W\) always points to the purple region where \[n=w_{1,1}p_1+w_{1,2}p_2+b>0\] 2. The relation of \(W\) and the direction of the decision boundary is orthogonality.

Can this simple example perform some task? Of course. If we test the input \(\boldsymbol{p}_1=\begin{bmatrix}0\\\\0\end{bmatrix},\boldsymbol{p}_1=\begin{bmatrix}0\\\\1\end{bmatrix},\boldsymbol{p}_1=\begin{bmatrix}1\\\\0\end{bmatrix},\boldsymbol{p}_1=\begin{bmatrix}1\\\\1\end{bmatrix}\) respectively, we could get \(a_1=0,a_2=0,a_3=0,a_4=1\). This is the 'and' operation in logical calculation.

input | output | |
---|---|---|

\(\begin{bmatrix}0\\\\0\end{bmatrix}\) | \(0\) | |

\(\begin{bmatrix}0\\\\1\end{bmatrix}\) | \(0\) | |

\(\begin{bmatrix}1\\\\0\end{bmatrix}\) | \(0\) | |

\(\begin{bmatrix}1\\\\1\end{bmatrix}\) | \(1\) |

### Multiple-Neuron Perceptron

Multiple neuron perceptron is just a combination of some perceptron, whose weight matrix \(W\) has multiple rows. And the transpose of row \(i\) of the \(W\) is notated as \(_iW\)(column form of the \(i\)th row in \(W\)) then the \(i\)th neuron has the decision boundary: \[ _iW^T\boldsymbol{p}+b_i=0 \]

For the property of hard limit function that the output could just be one of \(\{0,1\}\). And for \(S\) neurons, there is at most \(2^S\) categories, that is to say to a \(S\) neuron perceptron in one layer it is impossible to perform the task which needs it to classify more than \(2^S\) classes.

## Perceptron Learning Rule

Perceptron learning rule here is a kind of supervised learning rule. Recall some results above: 1. supervised learning has both input and corresponding correct target. 2. target and output produced by the current model and input give the information to how to modify the model 3. \(W\) always points to the region where \(a=1\)

### Constructing Learning Rules

With above results we try to construct the rule, and assuming training datas are: \[ \{\begin{bmatrix}1\\2\end{bmatrix},1\},\{\begin{bmatrix}-1\\2\end{bmatrix},0\},\{\begin{bmatrix}0\\-1\end{bmatrix},0\} \]

#### Assigning Some Initial Values

To modify the model, we need the output which is produced by the weights and input, and for this is supervised learning algorithm we have both input and correct outputs (targets), what we need to do is just assign some values to weights(here we omit the bias). Like: \[ _1W=\begin{bmatrix}1.0\\-0.8\end{bmatrix} \]

#### Calculating output

The output is easy to be calculated: \[ a=f(_1W^T\boldsymbol{p}_1) \]

when \(\boldsymbol{p}_1=\begin{bmatrix}1\\ 2\end{bmatrix}\), we have: \[ n=\begin{bmatrix}1.0 &-0.8\end{bmatrix}\begin{bmatrix}1\\2\end{bmatrix}=-0.6\\ a=f(-0.6)=0 \] however the target is \(1\) then this is a wrong prediction of current model. What we would do is modify the model.

As we mentioned above, \(\boldsymbol{_1W}\) points to the purple region

where output is \(1\). So we should modify the direction of \(\boldsymbol{_1W}\) close to the direction of \(p_1\).

The intuitive strategy is setting \(\boldsymbol{_1W}=p_i\) when the output is \(0\) while the target is \(1\) and setting \(\boldsymbol{_1W}=-p_i\) when the output is \(1\) while the target is \(0\)(where \(i\) less than the size of the training set). however, there are only three training data in this example, so there are only three possible decision boundaries, it can not guarantee that there must have a line \(\boldsymbol{_1W}=p_i\) that separates the input correctly.

Although this strategy does not work, it has a good inspiration: 1. if \(t=1\) and \(a=0\), modify \(_1W\) close to \(p_i\) 2. if \(t=0\) and \(a=1\), modify \(_1W\) away from \(p_i\) or modify \(_1W\) close to \(-p_i\) 3. if \(t=a\) do nothing

Then we find to make a vector close to another, we can add up the two vectors and the result is close to either of them than the other. Our algorithm then become:

- if \(t=1\) and \(a=0\), \(_1W^{\text{new}}= _1W^{\text{old}}+p_i\)
- if \(t=0\) and \(a=1\), \(_1W^{\text{new}}= _1W^{\text{old}}-p_i\)
- if \(t=a\) do nothing

### Unified Learning Rule

The target and output product information together to modify the model, here we introduce the most simple but useful information, error: \[ e_i=t_i-a_i \] so, our algorithm is

- if \(t=1\) and \(a=0\) where \(e=1-0=1\): \(_1W^{\text{new}}= _1W^{\text{old}}+p_i\)
- if \(t=0\) and \(a=1\) where \(e=0-1=-1\): \(_1W^{\text{new}}= _1W^{\text{old}}-p_i\)
- if \(t=a\) where \(e=t-a=0\): do nothing

and it's not hard to notice that \(e\) has the same sign with \(p_i\). Then the algorithm can be simplified as: \[ _1W^{\text{new}}=_1W^{\text{old}}+e\cdot p_i=_1W^{\text{old}}+(t_i-a_i))\cdot p_i \]

This can be easily extend to bias: \[ b^{\text{new}}=b^{\text{old}}+e\cdot 1=b^{\text{old}}+(t_i-a_i) \] and to the multiple neurons perceptron networks the algorithm in matrix form is: \[ W^{\text{new}}=W^{\text{old}}+\boldsymbol{e}\cdot \boldsymbol{p_i}\\ \boldsymbol{b}^{\text{new}}=\boldsymbol{b}^{\text{old}}+\boldsymbol{e}_i\cdot \boldsymbol{1} \]

## Conclusion

- Perceptron is still working today
- The learning rule of the perceptron is a good example of having a close look at the learning rules of neuron networks
- Perceptron has some connection with other machine learning algorithms like linear classification
- A single perceptron has a lot of limits but multiple layers perceptrons are powerful

## References

Demuth H B, Beale M H, De Jess O, et al. Neural network design[M]. Martin Hagan, 2014.↩︎