Keywords: Hebbian Learning, Supervised Hebbian Learning

The whole Hebbian learning theory:

## Performance Analysis 1

Now let’s go into the inside of ‘linear associator ‘ from a mathematical view. Mathematical analysis or mathematical proof can bring us strong confidence in the following implementation of the Hebb’s rule.

### $\boldsymbol{p}_q$ are Orthonormal

Firstly, Considering the most special but simple aspect, when all inputs $\boldsymbol{p}_q$ are orthonormal which means orthogonal mutually and having unit length. Then with equation(10) the output corresponding to the input $\boldsymbol{P}_q$ can be computed:

$$\boldsymbol{a}=\boldsymbol{W}\boldsymbol{p}_k=(\sum^{Q}_{q=1}\boldsymbol{t}_q\boldsymbol{p}_q^T)\boldsymbol{p}_k=\sum^{Q}_{q=1}\boldsymbol{t}_q(\boldsymbol{p}_q^T\boldsymbol{p}_k)\tag{11}$$

for we have supposed that $\boldsymbol{p}_q$ are orthonormal which means:

$$\boldsymbol{p}_q^T\boldsymbol{p}_k=\begin{cases} 1&\text{ if }q=k\\ 0&\text{ if }q\neq k \end{cases}\tag{12}$$

from equation(11) and equation(12), we confirm that weights matrix $\boldsymbol{W}$ built through Hebb’s postulate gives the right outputs when inputs are orthonormal and have been used in training(this implies that the generated capacity is not considered here).

The conclusion is that if input prototype vectors are orthonormal, Hebb’s rule is correct.

### $\boldsymbol{p}_q$ are Normal but not Orthogonal

More generally case is $\boldsymbol{p}_q$ are not Orthogonal. And before feed them into algorithm, we can convert every prototype vector into unit length without change their directions. Then we have:

$$\boldsymbol{a}=\boldsymbol{W}\boldsymbol{p}_k=(\sum^{Q}_{q=1}\boldsymbol{t}_q\boldsymbol{p}_q^T)\boldsymbol{p}_k=\sum^{Q}_{q=1}\boldsymbol{t}_q(\boldsymbol{p}_q^T\boldsymbol{p}_k)=\boldsymbol{t}_k+\sum_{q\neq k}\boldsymbol{t}_q(\boldsymbol{p}_q^T\boldsymbol{p}_k)\tag{13}$$

For we the vectors are nomarl but not orthogonal:

$$\boldsymbol{t}_q\boldsymbol{p}_q^T\boldsymbol{p}_k=\begin{cases} \boldsymbol{t}_q & \text{ when } q = k\\ \boldsymbol{t}_q\boldsymbol{p}_q^T\boldsymbol{p}_k & \text{ when } q \nsupseteq k \end{cases}\tag{14}$$

then equation(13) can be also written as:

$$\boldsymbol{a}=\boldsymbol{t}_k+\sum_{q\neq k}\boldsymbol{t}_q(\boldsymbol{p}_q^T\boldsymbol{p}_k)\tag{15}$$

if we want to produce the outputs of linear associator as close as the targets, $\sum_{q\neq k}\boldsymbol{t}_q(\boldsymbol{p}_q^T\boldsymbol{p}_k)$ should be as small as possible.

An example, when we have the training set:

$$\{\boldsymbol{p}_1=\begin{bmatrix}0.5\\-0.5\\0.5\\-0.5\end{bmatrix},\boldsymbol{t}_1=\begin{bmatrix}1\\-1\end{bmatrix}\}, \{\boldsymbol{p}_2=\begin{bmatrix}0.5\\0.5\\-0.5\\-0.5\end{bmatrix},\boldsymbol{t}_1=\begin{bmatrix}1\\1\end{bmatrix}\}$$

then the weight matrix can be calculated:

$$\boldsymbol{W}=\boldsymbol{T}\boldsymbol{P}^T=\begin{bmatrix} \boldsymbol{t}_1&\boldsymbol{t}_2 \end{bmatrix}\begin{bmatrix} \boldsymbol{p}_1^T\\\boldsymbol{p}_2^T \end{bmatrix} = \begin{bmatrix} 1&1\\ -1&1 \end{bmatrix}\begin{bmatrix} 0.5&-0.5&0.5&-0.5\\ 0.5&0.5&-0.5&-0.5 \end{bmatrix}=\begin{bmatrix} 1&0&0&-1\\ 0&1&-1&0 \end{bmatrix}$$

we can, now, test these two inputs:

1. $\boldsymbol{a}_1=\boldsymbol{W}\boldsymbol{p}_1=\begin{bmatrix}1&0&0&-1\\0&1&-1&0\end{bmatrix}\begin{bmatrix}0.5\\-0.5\\0.5\\-0.5\end{bmatrix}=\begin{bmatrix}1\\-1\end{bmatrix}=\boldsymbol{t}_1$ Correct!
2. $\boldsymbol{a}_2=\boldsymbol{W}\boldsymbol{p}_2=\begin{bmatrix}1&0&0&-1\\0&1&-1&0\end{bmatrix}\begin{bmatrix}0.5\\0.5\\-0.5\\-0.5\end{bmatrix}=\begin{bmatrix}1\\1\end{bmatrix}=\boldsymbol{t}_2$ Correct!

Another example, when we have the training set:

$$\{\boldsymbol{p}_1=\begin{bmatrix}1\\-1\\-1\end{bmatrix},\text{orange}\}, \{\boldsymbol{p}_2=\begin{bmatrix}1\\1\\-1\end{bmatrix},\text{apple}\}$$

firstly we convert target ‘apple’ and ‘orange’ into numbers:

• orange $\boldsymbol{t}_1=\begin{bmatrix}-1\end{bmatrix}$
• apple $\boldsymbol{t}_1=\begin{bmatrix}1\end{bmatrix}$

secondly we normalize the input vector that would make them has a unit length:

$$\{\boldsymbol{p}_1=\begin{bmatrix}0.5774\\-0.5774\\0.5774\end{bmatrix},\boldsymbol{t}_1=\begin{bmatrix}-1\end{bmatrix}\}, \{\boldsymbol{p}_2=\begin{bmatrix}0.5774\\0.5774\\-0.5774\end{bmatrix},\boldsymbol{t}_1=\begin{bmatrix}1\end{bmatrix}\}$$

then the weight matrix can be calculated:

$$\boldsymbol{W}=\boldsymbol{T}\boldsymbol{P}^T=\begin{bmatrix} \boldsymbol{t}_1&\boldsymbol{t}_2 \end{bmatrix}\begin{bmatrix} \boldsymbol{p}_1^T\\\boldsymbol{p}_2^T \end{bmatrix} = \begin{bmatrix} -1&1 \end{bmatrix}\begin{bmatrix} 0.5774&-0.5774&-0.5774\\ 0.5774&0.5774&-0.5774 \end{bmatrix}=\begin{bmatrix} 0&1.1548&0 \end{bmatrix}$$

we can, now, test these two inputs:

1. $\boldsymbol{a}_1=\boldsymbol{W}\boldsymbol{p}_1=\begin{bmatrix} 0&1.1548&0\end{bmatrix}\begin{bmatrix}0.5774\\-0.5774\\-0.5774\end{bmatrix}=\begin{bmatrix}-0.6668\end{bmatrix}=\boldsymbol{t}_1$ Correct!
2. $\boldsymbol{a}_2=\boldsymbol{W}\boldsymbol{p}_2=\begin{bmatrix} 0&1.1548&0\end{bmatrix}\begin{bmatrix}0.5774\\0.5774\\-0.5774\end{bmatrix}=\begin{bmatrix}0.6668\end{bmatrix}$

$\boldsymbol{a}_1$ is closer to $[-1]$ than $$ so it belongs to $\boldsymbol{t}_1$ And $\boldsymbol{a}_2$ is closer to $$ than $[-1]$ so it belongs to $\boldsymbol{t}_2$. So, the algorithm gives an output close to the correct target.

There is another kind of algorithm that can deal with the task correctly rather than closely, for instance, the pseudoinverse rule can come up with another $\boldsymbol{W}^{\star}$ which can give the correct answer to the above question.

## Conclusion

Hebb algorithm can be concluded bellow:

1. Convert outputs into numbers which are constructed by only $\{-1,+1\}$
2. Normalize input vectors
3. Calculating weight matrix
4. Test the algorithm with input prototype vectors

## References

1 Demuth, H.B., Beale, M.H., De Jess, O. and Hagan, M.T., 2014. Neural network design. Martin Hagan.