Keywords: performance surfaces, optimum points


Part I can be found: https://brainbomb.org/Artificial-Intelligence/Neural-Networks/Neural-Network-Design-Performance-Surfaces-and-Optimum-Points/

Necessary Conditions for Optimality1

The main objective of performance learning is to minimize the performance index. So the condition of existence of minimum of performance index should be investagated:

  1. A strong minimum $\boldsymbol{x}$ which mean in any deriction $\boldsymbol{p}_{\text{unit}}$ around this point with a short distance $\delta$ always has $F(\boldsymbol{x})<F(\boldsymbol{x}+\delta \boldsymbol{p}_{\text{unit}})$, where $\delta \to 0$
  2. A weak minimum $\boldsymbol{x}$ which mean in any deriction $\boldsymbol{p}_{\text{unit}}$ around this point with a short distance $\delta$ always has $F(\boldsymbol{x})\leq F(\boldsymbol{x}+\delta \boldsymbol{p}_{\text{unit}})$, where $\delta \to 0$
  3. Global minimum is the minimum one of all weak or strong minimum set.

For instance,

$$
F(x)=3x^4-7x^2-\frac{1}{2}x+6\tag{13}
$$

  • 2 local minimums at near $x_1=-1.1$ and near $x_2=1.1$
  • near $x_2=1.1$ gives the global minimum

If the variable of function is a vector:

$$
F(\boldsymbol{x})=(x_2-x_1)^4+8x_1x_2-x_1+x_2+3\tag{14}
$$

and it looks like:

in 3-D space. And the contour plot of this function is:

There are three points in this contour figure:

and the pink points are the two local minimums, and the black points are called a saddle point.

These two examples illustrate the minimum points and saddle points. But what conditions are needed to confirm the existence?

First-order Conditions

Go back to the Taylor series with $\Delta x=x-x^{\star}\neq 0$, the first item of series is taking into account:

$$
\begin{aligned}
F(x)&=F(x^{\star})+\frac{d}{dx}F(x)|_{x=x^{\star}}(x-x^{\star})\\
F(x^{\star}+\Delta x)&=F(x^{\star})+\nabla F(x)\Delta x
\end{aligned}\tag{15}
$$

when $x^{\star}$ is a candidate of minimum point, we want:

$$
F(x^{\star}+\Delta x)\geq F(x^{\star})\tag{16}
$$

so, in equation(15),

$$
\nabla F(x)\Delta x\geq 0 \tag{17}
$$

are needed. Considering another direction, if $x^{\star}$ is the minimum point, it also has:

$$
F(x^{\star}-\Delta x)=F(x^{\star})-\nabla F(x)\Delta x\geq F(x^{\star})\tag{18}
$$

then

$$
\nabla F(x)\Delta x\leq 0 \tag{19}
$$

To satisfy both equation(17) and equation(19) if and only if

$$
\nabla F(x)\Delta x=0\tag{20}
$$

Because $\Delta x\neq 0$, so we must have $\nabla F(x)$ when $x$ is the minimum point. $\nabla F(x)=0$ is a necessary but not sufficient condition. The first-order condition has been concluded above.

Second_order Condition

The first order condition does not give a sufficient condition. Now let’s consider the second-order conditon. With $\Delta\boldsymbol{x}=\boldsymbol{x}-\boldsymbol{x}^{\star}\neq \boldsymbol{0}$ and gradient equal to $\boldsymbol{0}$, and take them into equation(5) (in part I). Then the second order Taylor series is:

$$
F(\boldsymbol{\Delta\boldsymbol{x}-\boldsymbol{x}^{\star}})=F(\boldsymbol{x}^{\star})+\frac{1}{2}\Delta\boldsymbol{x}^T\nabla^2 F(x)|_{\boldsymbol{x}=\boldsymbol{x}^{\star}}\Delta\boldsymbol{x}\tag{21}
$$

When $||\Delta\boldsymbol{x}||$ is small, zero-order, first-order and second-order iterms of Taylor series are precise enough to approximate original function. When $F(\boldsymbol{x}^{\star})$ is a strong minimum, the second-order iterm should be:

$$
\frac{1}{2}\Delta\boldsymbol{x}^T\nabla^2 F(x)|_{\boldsymbol{x}=\boldsymbol{x}^{\star}}\Delta\boldsymbol{x}>0
$$

Recalling linear algebra knowledge, positive definite matrix $A$ is:

$$
\boldsymbol{z}^T A \boldsymbol{z}>0 \text{ for any } \boldsymbol{z}
$$

and positive semidefinite matrix $A$ is:

$$
\boldsymbol{z}^T A \boldsymbol{z}\geq0 \text{ for any } \boldsymbol{z}
$$

So when the gradient is $\boldsymbol{0}$ and second-order derivative are positive definite(semidefinite), this point is a strong(weak) minimum. And positive definite Hessian matrix is sufficient condition to a strong minimum point, but it’s not a necessary condition. Because when the Hessian matrix is $0$ the third-order item has to be calculated.

 

References


1 Demuth, H.B., Beale, M.H., De Jess, O. and Hagan, M.T., 2014. Neural network design. Martin Hagan.
Last modified: March 24, 2020