Quadratic function, a type of performance index, is universal. One of its key properties is that it can be represented in a second-order Taylor series precisely.

$$F(\boldsymbol{x})=\frac{1}{2}\boldsymbol{x}^TA\boldsymbol{x}+\boldsymbol{d}\boldsymbol{x}+c\tag{1}$$

where $A$ is a symmetric matrix(if it is not symmetric, it can be easily converted into symmetric).
And recall the property of gradient:

$$\nabla (\boldsymbol{h}^T\boldsymbol{x})=\nabla (\boldsymbol{x}^T\boldsymbol{h})=\boldsymbol{h}\tag{2}$$

and

$$\nabla (\boldsymbol{x}^TQ\boldsymbol{x})=Q\boldsymbol{x}+Q^T\boldsymbol{x}=2Q\boldsymbol{x}\tag{3}$$

then the first-order of quadratic functions is:

$$\nabla F(\boldsymbol{x})=A\boldsymbol{x}+\boldsymbol{d}\tag{4}$$

and second-order of quadratic functions is:

$$\nabla^2 F(\boldsymbol{x})=A\tag{5}$$

More than second-order derivatives are $0$.

The shape of quadratic functions can be described by eigenvalues and eigenvectors of its Hessian matrix. Hessian matrix is a symmetric matrix and its eigenvectors are mutually orthogonal, let:

$$\boldsymbol{z}_1,\boldsymbol{z}_2,\cdots ,\boldsymbol{z}_n\tag{6}$$

denote the whole set of the eigenvectors of the Hessian matrix $A$ and we build a matrix:

$$B=\begin{bmatrix} \boldsymbol{z}_1&\boldsymbol{z}_2&\cdots &\boldsymbol{z}_n \end{bmatrix}\tag{7}$$

and then we have $BB^T=I$ then $B^{-1}=B^T$. When Hessian matrix is positive definite, it would have $n$ eigenvectors(Hessian has a full rank $n$) and we can change Hessian matrix into a new matrix whose basises are the set of eigenvectors:

$$A’=B^TAB=\begin{bmatrix} \lambda_1&&&\\ &\lambda_2&&\\ &&\ddots&\\ &&&\lambda_n \end{bmatrix}=\Lambda\tag{8}$$

after changing the basis, the new Hessian matrix is a diagonal matrix whose elements on the diagonal are the eigenvalues. This process can be inversed because of $BB^T=I$:

$$A=BA’B^T=B\Lambda B^T\tag{9}$$

We can calculate the derivative towards any directions:

$$\frac{\boldsymbol{p}^T\nabla^2 F(\boldsymbol{x})\boldsymbol{p}}{||\boldsymbol{p}||^2}=\frac{\boldsymbol{p}^TA\boldsymbol{p}}{||\boldsymbol{p}||^2}\tag{10}$$

For columns of $B$ can span the whole space, So we can find a vector $\boldsymbol{c}$ satisfy:

$$\boldsymbol{p}=B\boldsymbol{c}\tag{11}$$

then take equation(8), equation(11) into equation(10) we get:

$$\frac{\boldsymbol{c}^TB^TAB\boldsymbol{c}}{\boldsymbol{c}^TB^TB\boldsymbol{c}}=\frac{\boldsymbol{c}^T\Lambda\boldsymbol{c}}{\boldsymbol{c}^TI\boldsymbol{c}}=\frac{\sum^n_{i=1}\lambda_i c_i^2}{\sum_{i=1}^n c^2_i}\tag{12}$$

From equation(12), we could conclude:

$$\lambda_{\text{min}}\leq \frac{\boldsymbol{p}^TA\boldsymbol{p}}{||\boldsymbol{p}||^2}\leq \lambda_{\text{max}}\tag{13}$$
1. maximum $2^{\text{nd}}$ derivative occure along with $\boldsymbol{z}_{\text{max}}$(according to $\lambda_{\text{max}}$)
2. eigenvalues is the $2^{\text{nd}}$ derivative along its eigenvector.
3. eigenvectors could define a new coordinate system
4. eigenvectors are principal axes of the function contour.
5. going along with the $\boldsymbol{z}_{\text{max}}$ direction could have the largest change in function value $|\Delta F(x)|$
6. eigenvalues here are all positive because of positive definite.

An example:

$$F(\boldsymbol{x})=x_1^2+x_1x_2+x_2^2 =\frac{1}{2}\boldsymbol{x}^T \begin{bmatrix} 2&1\\1&2 \end{bmatrix}\boldsymbol{x}\tag{14}$$

we can calculate the eigenvectors and eigenvalues:

\begin{aligned} &\lambda_1=1&&\boldsymbol{z}_1=\begin{bmatrix} 1\\-1 \end{bmatrix}\\ &\lambda_2=3&&\boldsymbol{z}_2=\begin{bmatrix} 1\\1 \end{bmatrix} \end{aligned}\tag{15}

The contour plot and 3-D plots is:  Another example:

$$F(\boldsymbol{x})=-\frac{1}{4}x_1^2-\frac{3}{2}x_1x_2-\frac{1}{4}x_2^2 =\frac{1}{2}\boldsymbol{x}^T \begin{bmatrix} -0.5&-1.5\\-1.5&-0.5 \end{bmatrix}\boldsymbol{x}\tag{16}$$

we can calculate the eigenvectors and eigenvalues:

\begin{aligned} &\lambda_1=1&&\boldsymbol{z}_1=\begin{bmatrix} -1\\1 \end{bmatrix}\\ &\lambda_2=-2&&\boldsymbol{z}_2=\begin{bmatrix} -1\\-1 \end{bmatrix} \end{aligned}\tag{17}

The contour plot and 3-D plots is:  ## Conclusion

1. $\lambda_i>0$ or $i=1,2,\cdots$, $F(x)$ have a single strong minimum
2. $\lambda_i<0$ or $i=1,2,\cdots$, $F(x)$ have a single strong maximum
3. $\lambda_i$ have both negative and positive together. $F(x)$ has a saddle point
4. $\lambda_i\geq 0$ and have a $\lambda_j=0$, $F(x)$ has a weak minimum or has no stationary point.
5. $\lambda_i\leq 0$ and have a $\lambda_j=0$, $F(x)$ has a weak maximum or has no stationary point.

## References

1 Demuth, H.B., Beale, M.H., De Jess, O. and Hagan, M.T., 2014. Neural network design. Martin Hagan.