Keywords: Hebbian Learning, Supervised Hebbian Learning

The whole Hebbian learning theory:

## Application of Hebb Learning1

An application is proposed here. We have 3 input and outputs: They are $5\times 6$ images which have only white and black pixels.   We then read the image and convert the white and black into $\{1,-1\}$ so the ‘zero’ image change into the matrix:

\begin{aligned} \{&\\ &-1,1,1,1,-1,\\ &1,-1,-1,-1,1,\\ &1,-1,-1,-1,1,\\ &1,-1,-1,-1,1,\\ &1,-1,-1,-1,1,\\ &-1,1,1,1,-1\\ \}& \end{aligned}

we use the inputs as the target then the neuron network architecture become(transfer function is the hard limit): following the algorithm we summarized above we got the code:

The whole project can be found: https://github.com/Tony-Tan/NeuronNetworks/tree/master/supervised_Hebb_learning

The algorithm gives the following result(left: input; right: output):   It looks like you have associate memory.

## Some Variations of Hebb Learning

Derivate rules of Hebb learning are developed. And they overcome the shortage of Hebb learning algorithm, like:

Elements of $\boldsymbol{W}$ would grow bigger when more prototypes are provided.

To overcome this problem, a lot of ideas came into mind:

1. Learning rate $\alpha$ can be used to slow down this phenomina
2. Adding a decay term, so the learning rule is changed into a smooth filter: $\boldsymbol{W}^{\text{new}}=\boldsymbol{W}^{\text{old}}+\alpha\boldsymbol{t}_q\boldsymbol{p}_q^T-\gamma\boldsymbol{W}^{\text{old}}$ which can also be written as $\boldsymbol{W}^{\text{new}}=(1-\gamma)\boldsymbol{W}^{\text{old}}+\alpha\boldsymbol{t}_q\boldsymbol{p}_q^T$ where $0<\gamma<1$
3. Using the residual between output and target to multipy input as the increasement of the weights: $\boldsymbol{W}^{\text{new}}=\boldsymbol{W}^{\text{old}}+\alpha(\boldsymbol{t}_q-\boldsymbol{a}_q)\boldsymbol{p}_q^T$

The second idea, when $\gamma\to 1$ the algorithm quickly forgets the old weights. but when $\gamma\to 0$ the algorithm goes back to the standard form. This idea of filter would be widely used in the following algorithms.
The third method also known as the Widrow-Hoff algorithm, could minimize mean square error as well as minimize the sum of the square error and this algorithm also has another advantage that is the update of the weights step by step whenever the prototype is provided. So it can quickly adapt to the changing environment while some other algorithms do not have this feature.

## References

1 Demuth, H.B., Beale, M.H., De Jess, O. and Hagan, M.T., 2014. Neural network design. Martin Hagan.