**Keywords:** performance surfaces, optimum points

Part I can be found: https://brainbomb.org/Artificial-Intelligence/Neural-Networks/Neural-Network-Design-Performance-Surfaces-and-Optimum-Points/

## Necessary Conditions for Optimality^{1}

The main objective of performance learning is to minimize the performance index. So the condition of existence of minimum of performance index should be investagated:

- A strong minimum $\boldsymbol{x}$ which mean in any deriction $\boldsymbol{p}_{\text{unit}}$ around this point with a short distance $\delta$ always has $F(\boldsymbol{x})<F(\boldsymbol{x}+\delta \boldsymbol{p}_{\text{unit}})$, where $\delta \to 0$
- A weak minimum $\boldsymbol{x}$ which mean in any deriction $\boldsymbol{p}_{\text{unit}}$ around this point with a short distance $\delta$ always has $F(\boldsymbol{x})\leq F(\boldsymbol{x}+\delta \boldsymbol{p}_{\text{unit}})$, where $\delta \to 0$
- Global minimum is the minimum one of all weak or strong minimum set.

For instance,

F(x)=3x^4-7x^2-\frac{1}{2}x+6\tag{13}

$$

- 2 local minimums at near $x_1=-1.1$ and near $x_2=1.1$
- near $x_2=1.1$ gives the global minimum

If the variable of function is a vector:

F(\boldsymbol{x})=(x_2-x_1)^4+8x_1x_2-x_1+x_2+3\tag{14}

$$

and it looks like:

in 3-D space. And the contour plot of this function is:

There are three points in this contour figure:

and the pink points are the two local minimums, and the black points are called a saddle point.

These two examples illustrate the minimum points and saddle points. But what conditions are needed to confirm the existence?

### First-order Conditions

Go back to the Taylor series with $\Delta x=x-x^{\star}\neq 0$, the first item of series is taking into account:

\begin{aligned}

F(x)&=F(x^{\star})+\frac{d}{dx}F(x)|_{x=x^{\star}}(x-x^{\star})\\

F(x^{\star}+\Delta x)&=F(x^{\star})+\nabla F(x)\Delta x

\end{aligned}\tag{15}

$$

when $x^{\star}$ is a candidate of minimum point, we want:

F(x^{\star}+\Delta x)\geq F(x^{\star})\tag{16}

$$

so, in equation(15),

\nabla F(x)\Delta x\geq 0 \tag{17}

$$

are needed. Considering another direction, if $x^{\star}$ is the minimum point, it also has:

F(x^{\star}-\Delta x)=F(x^{\star})-\nabla F(x)\Delta x\geq F(x^{\star})\tag{18}

$$

then

\nabla F(x)\Delta x\leq 0 \tag{19}

$$

To satisfy both equation(17) and equation(19) if and only if

\nabla F(x)\Delta x=0\tag{20}

$$

Because $\Delta x\neq 0$, so we must have $\nabla F(x)$ when $x$ is the minimum point. $\nabla F(x)=0$ is a necessary but not sufficient condition. The first-order condition has been concluded above.

### Second_order Condition

The first order condition does not give a sufficient condition. Now let’s consider the second-order conditon. With $\Delta\boldsymbol{x}=\boldsymbol{x}-\boldsymbol{x}^{\star}\neq \boldsymbol{0}$ and gradient equal to $\boldsymbol{0}$, and take them into equation(5) (in part I). Then the second order Taylor series is:

F(\boldsymbol{\Delta\boldsymbol{x}-\boldsymbol{x}^{\star}})=F(\boldsymbol{x}^{\star})+\frac{1}{2}\Delta\boldsymbol{x}^T\nabla^2 F(x)|_{\boldsymbol{x}=\boldsymbol{x}^{\star}}\Delta\boldsymbol{x}\tag{21}

$$

When $||\Delta\boldsymbol{x}||$ is small, zero-order, first-order and second-order iterms of Taylor series are precise enough to approximate original function. When $F(\boldsymbol{x}^{\star})$ is a strong minimum, the second-order iterm should be:

\frac{1}{2}\Delta\boldsymbol{x}^T\nabla^2 F(x)|_{\boldsymbol{x}=\boldsymbol{x}^{\star}}\Delta\boldsymbol{x}>0

$$

Recalling linear algebra knowledge, positive definite matrix $A$ is:

\boldsymbol{z}^T A \boldsymbol{z}>0 \text{ for any } \boldsymbol{z}

$$

and positive semidefinite matrix $A$ is:

\boldsymbol{z}^T A \boldsymbol{z}\geq0 \text{ for any } \boldsymbol{z}

$$

So when the gradient is $\boldsymbol{0}$ and second-order derivative are positive definite(semidefinite), this point is a strong(weak) minimum. And positive definite Hessian matrix is sufficient condition to a strong minimum point, but it’s not a necessary condition. Because when the Hessian matrix is $0$ the third-order item has to be calculated.