Keywords: performance surfaces, optimum points

## Necessary Conditions for Optimality1

The main objective of performance learning is to minimize the performance index. So the condition of existence of minimum of performance index should be investagated:

1. A strong minimum $\boldsymbol{x}$ which mean in any deriction $\boldsymbol{p}_{\text{unit}}$ around this point with a short distance $\delta$ always has $F(\boldsymbol{x})<F(\boldsymbol{x}+\delta \boldsymbol{p}_{\text{unit}})$, where $\delta \to 0$
2. A weak minimum $\boldsymbol{x}$ which mean in any deriction $\boldsymbol{p}_{\text{unit}}$ around this point with a short distance $\delta$ always has $F(\boldsymbol{x})\leq F(\boldsymbol{x}+\delta \boldsymbol{p}_{\text{unit}})$, where $\delta \to 0$
3. Global minimum is the minimum one of all weak or strong minimum set.

For instance,

$$F(x)=3x^4-7x^2-\frac{1}{2}x+6\tag{13}$$ • 2 local minimums at near $x_1=-1.1$ and near $x_2=1.1$
• near $x_2=1.1$ gives the global minimum

If the variable of function is a vector:

$$F(\boldsymbol{x})=(x_2-x_1)^4+8x_1x_2-x_1+x_2+3\tag{14}$$

and it looks like: in 3-D space. And the contour plot of this function is: There are three points in this contour figure: and the pink points are the two local minimums, and the black points are called a saddle point.

These two examples illustrate the minimum points and saddle points. But what conditions are needed to confirm the existence?

### First-order Conditions

Go back to the Taylor series with $\Delta x=x-x^{\star}\neq 0$, the first item of series is taking into account:

\begin{aligned} F(x)&=F(x^{\star})+\frac{d}{dx}F(x)|_{x=x^{\star}}(x-x^{\star})\\ F(x^{\star}+\Delta x)&=F(x^{\star})+\nabla F(x)\Delta x \end{aligned}\tag{15}

when $x^{\star}$ is a candidate of minimum point, we want:

$$F(x^{\star}+\Delta x)\geq F(x^{\star})\tag{16}$$

so, in equation(15),

$$\nabla F(x)\Delta x\geq 0 \tag{17}$$

are needed. Considering another direction, if $x^{\star}$ is the minimum point, it also has:

$$F(x^{\star}-\Delta x)=F(x^{\star})-\nabla F(x)\Delta x\geq F(x^{\star})\tag{18}$$

then

$$\nabla F(x)\Delta x\leq 0 \tag{19}$$

To satisfy both equation(17) and equation(19) if and only if

$$\nabla F(x)\Delta x=0\tag{20}$$

Because $\Delta x\neq 0$, so we must have $\nabla F(x)$ when $x$ is the minimum point. $\nabla F(x)=0$ is a necessary but not sufficient condition. The first-order condition has been concluded above.

### Second_order Condition

The first order condition does not give a sufficient condition. Now let’s consider the second-order conditon. With $\Delta\boldsymbol{x}=\boldsymbol{x}-\boldsymbol{x}^{\star}\neq \boldsymbol{0}$ and gradient equal to $\boldsymbol{0}$, and take them into equation(5) (in part I). Then the second order Taylor series is:

$$F(\boldsymbol{\Delta\boldsymbol{x}-\boldsymbol{x}^{\star}})=F(\boldsymbol{x}^{\star})+\frac{1}{2}\Delta\boldsymbol{x}^T\nabla^2 F(x)|_{\boldsymbol{x}=\boldsymbol{x}^{\star}}\Delta\boldsymbol{x}\tag{21}$$

When $||\Delta\boldsymbol{x}||$ is small, zero-order, first-order and second-order iterms of Taylor series are precise enough to approximate original function. When $F(\boldsymbol{x}^{\star})$ is a strong minimum, the second-order iterm should be:

$$\frac{1}{2}\Delta\boldsymbol{x}^T\nabla^2 F(x)|_{\boldsymbol{x}=\boldsymbol{x}^{\star}}\Delta\boldsymbol{x}>0$$

Recalling linear algebra knowledge, positive definite matrix $A$ is:

$$\boldsymbol{z}^T A \boldsymbol{z}>0 \text{ for any } \boldsymbol{z}$$

and positive semidefinite matrix $A$ is:

$$\boldsymbol{z}^T A \boldsymbol{z}\geq0 \text{ for any } \boldsymbol{z}$$

So when the gradient is $\boldsymbol{0}$ and second-order derivative are positive definite(semidefinite), this point is a strong(weak) minimum. And positive definite Hessian matrix is sufficient condition to a strong minimum point, but it’s not a necessary condition. Because when the Hessian matrix is $0$ the third-order item has to be calculated.

## References

1 Demuth, H.B., Beale, M.H., De Jess, O. and Hagan, M.T., 2014. Neural network design. Martin Hagan.