Keywords: Hebbian Learning, Supervised Hebbian Learning

The whole Hebbian learning theory:

## Hebb Rule1

Hebb rule is one of the earliest neural network learning laws. It was published in 1949 by Donald O. Hebb, a Canadian psychologist, in his work ‘The Organization of Behavior‘. In this great book, he proposed a possible mechanism for synaptic modification in the brain. And this rule then was used in training the artificial neural network for pattern recognition.

A lot of linear algebra knowledge is used in this post.

## ‘The Organization of Behavior‘

The main premise of the book is that behavior could be explained by the action of a neuron. This was a relatively different idea at that time when the dominant concept is the correlation between stimulus and response by psychologists. This could also be considered as a battle between ‘top-down’ philosophy and ‘down-top’ philosophy.

“When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A’s efficiency, as one of the cells firing B, is increased”

This is a physical mechanism for learning at the cellular level. Dr. Hebb thought if two nerve cells were closed enough and they seemed related that was both of them fired simultaneously in high frequency. Then the connection between them would be strengthened. However, at that time, Dr. Hebb did not give firm evidence of his theory. The subsequent research in the next few years did prove the existence of this strengthening.

Hebb’s postulate is not completely new, because some similar ideas had been proposed before. However, Dr. Hebb gave a more systematic postulate.

## Linear Associator

The first use of Hebb’s learning rule in artificial neural networks is a linear associator. Here is the simplest example of this neural network to illustrate the concept of Hebb’s postulate. A more complex architecture may drag us into the mire and miss the key points of the learning rule itself.

Linear associator was proposed by James Anderson and Teuwo Kohonen in 1972 independently. And its abbreviated notation is:

This architecture consists of a layer of $S$ neurons each of which has $R$ inputs and the transfer functions of them are all linear functions. So the output of this architecture can be simply calculated:

$$a_i=\sum_{j=1}^{R}w_{ij}p_j\tag{1}$$

where the $i$th component of the output vector is given by the summation of all inputs weighted by the weights of the connections between the input and the $i$th neuron. Or, it also can be written in a matrix form:

$$\boldsymbol{a}=\boldsymbol{W}\boldsymbol{p}\tag{2}$$

Associative memory is to learn $Q$ pairs of prototype input/output vectors:

$$\{\boldsymbol{p}_1,\boldsymbol{t}_1\},\{\boldsymbol{p}_2,\boldsymbol{t}_2\},\cdots,\{\boldsymbol{p}_Q,\boldsymbol{t}_Q\}\tag{3}$$

then the associator will output $\boldsymbol{a}=\boldsymbol{t}_i$ for $i=1,2,\cdots Q$ crospending to input $\boldsymbol{p}=\boldsymbol{p}_i$ for $i=1,2,\cdots Q$. And when slight change of input occured(i.e. $\boldsymbol{p}=\boldsymbol{p}_i+\delta$), output should also change slightly( i.e. $\boldsymbol{a}=\boldsymbol{t}_i+\varepsilon$).

Review Hebb rule:”if neurons on both sides of the synapse are activated simutaneously, the strength of the synapse will encrease”. And considering $\boldsymbol{a}_i=\sum_{j=1}^{R}\boldsymbol{w}_{ij}\boldsymbol{p}_j$ where $\boldsymbol{w}_{ij}$ is the weight between input $\boldsymbol{p}_j$ and output $\boldsymbol{a}_i$ so the mathematical Hebb rule is:

$$w_{ij}^{\text{new}} = w_{ij}^{\text{old}} + \alpha f_i(a_{iq}) g_j(p_{jq})\tag{4}$$

where:

• $\boldsymbol{q}$: the identification of the neuron
• $\alpha$: learning rate

This mathematical model uses two function $f$ and $g$ to map raw input and output into suitable values and then multiply them as a increasement to the weight of connection. These two actual functions are not known for sure, so writing the functions as linear functions is also reasonable. Then we have the simplifyed form of Hebb’s rule:

$$w_{ij}^{\text{new}} = w_{ij}^{\text{old}} + \alpha a_{iq} p_{jq} \tag{5}$$

A learning rate of $\alpha$ is necessary. Because it can be used to control the process of update of weights.

The equation(5) does not only represent the Hebb’s rule that the connection would increase when both sides of the synapses are active but also give author increment of connection when both sides of synapses are negative. This is an extension of Hebb’s rule which may have no biological fact to support.

In this post, we talk about the only supervised learning of Hebb’s rule. However, there is also an unsupervised version of Hebb’s rule which will be investigated in another post.
Recall that we have training set:

$$\{\boldsymbol{p}_1,\boldsymbol{t}_1\},\{\boldsymbol{p}_2,\boldsymbol{t}_2\},\cdots,\{\boldsymbol{p}_Q,\boldsymbol{t}_Q\}\tag{6}$$

The Hebb’s postulate states the relation between the outputs and the inputs. However, the outputs sometime are not the correct response to inputs in some task. And as we known, in supervised learning task correct outputs which is also called targets are given. So we replace the output of the model $a_{iq}$ in equation(5) with the known correct output(target) $t_{iq}$, so the supervised learning form of Hebb’s rule is:

$$w_{ij}^{\text{new}} = w_{ij}^{\text{old}} + \alpha t_{iq} p_{jq} \tag{7}$$

where:

• $t_{iq}$ is the $i$th element of $q$th target $\boldsymbol{t}_q$
• $p_{jq}$ is the $j$th element of $q$th input $\boldsymbol{p}_q$

of course, it also has matrix form:

$$\boldsymbol{W}^{\text{new}}=\boldsymbol{W}^{\text{old}}+\alpha\boldsymbol{t}_q\boldsymbol{p}_q^T\tag{8}$$

If we initial $\boldsymbol{W}=\boldsymbol{0}$ , we would get the final weight matrix for the training set:

$$\boldsymbol{W}=\boldsymbol{t}_1\boldsymbol{p}_1^T+\boldsymbol{t}_2\boldsymbol{p}_2^T+\cdots+\boldsymbol{t}_Q\boldsymbol{p}_Q^T=\sum_{i=1}^{Q}\boldsymbol{t}_i\boldsymbol{p}_i^T\tag{9}$$

or in a matrix form:

$$\boldsymbol{W}=\begin{bmatrix} \boldsymbol{t}_1,\boldsymbol{t}_2,\cdots,\boldsymbol{t}_Q \end{bmatrix}\begin{bmatrix} \boldsymbol{p}_1^T\\ \boldsymbol{p}_2^T\\ \vdots\\ \boldsymbol{p}_Q^T \end{bmatrix}=\boldsymbol{T}\boldsymbol{P}^T\tag{10}$$

## References

1 Demuth, H.B., Beale, M.H., De Jess, O. and Hagan, M.T., 2014. Neural network design. Martin Hagan.