Keywords: quadratic functions

Quadratic Functions1

Quadratic function, a type of performance index, is universal. One of its key properties is that it can be represented in a second-order Taylor series precisely.

$$
F(\boldsymbol{x})=\frac{1}{2}\boldsymbol{x}^TA\boldsymbol{x}+\boldsymbol{d}\boldsymbol{x}+c\tag{1}
$$

where $A$ is a symmetric matrix(if it is not symmetric, it can be easily converted into symmetric).
And recall the property of gradient:

$$
\nabla (\boldsymbol{h}^T\boldsymbol{x})=\nabla (\boldsymbol{x}^T\boldsymbol{h})=\boldsymbol{h}\tag{2}
$$

and

$$
\nabla (\boldsymbol{x}^TQ\boldsymbol{x})=Q\boldsymbol{x}+Q^T\boldsymbol{x}=2Q\boldsymbol{x}\tag{3}
$$

then the first-order of quadratic functions is:

$$
\nabla F(\boldsymbol{x})=A\boldsymbol{x}+\boldsymbol{d}\tag{4}
$$

and second-order of quadratic functions is:

$$
\nabla^2 F(\boldsymbol{x})=A\tag{5}
$$

More than second-order derivatives are $0$.

The shape of quadratic functions can be described by eigenvalues and eigenvectors of its Hessian matrix. Hessian matrix is a symmetric matrix and its eigenvectors are mutually orthogonal, let:

$$
\boldsymbol{z}_1,\boldsymbol{z}_2,\cdots ,\boldsymbol{z}_n\tag{6}
$$

denote the whole set of the eigenvectors of the Hessian matrix $A$ and we build a matrix:

$$
B=\begin{bmatrix}
\boldsymbol{z}_1&\boldsymbol{z}_2&\cdots &\boldsymbol{z}_n
\end{bmatrix}\tag{7}
$$

and then we have $BB^T=I$ then $B^{-1}=B^T$. When Hessian matrix is positive definite, it would have $n$ eigenvectors(Hessian has a full rank $n$) and we can change Hessian matrix into a new matrix whose basises are the set of eigenvectors:

$$
A’=B^TAB=\begin{bmatrix}
\lambda_1&&&\\
&\lambda_2&&\\
&&\ddots&\\
&&&\lambda_n
\end{bmatrix}=\Lambda\tag{8}
$$

after changing the basis, the new Hessian matrix is a diagonal matrix whose elements on the diagonal are the eigenvalues. This process can be inversed because of $BB^T=I$:

$$
A=BA’B^T=B\Lambda B^T\tag{9}
$$

We can calculate the derivative towards any directions:

$$
\frac{\boldsymbol{p}^T\nabla^2 F(\boldsymbol{x})\boldsymbol{p}}{||\boldsymbol{p}||^2}=\frac{\boldsymbol{p}^TA\boldsymbol{p}}{||\boldsymbol{p}||^2}\tag{10}
$$

For columns of $B$ can span the whole space, So we can find a vector $\boldsymbol{c}$ satisfy:

$$
\boldsymbol{p}=B\boldsymbol{c}\tag{11}
$$

then take equation(8), equation(11) into equation(10) we get:

$$
\frac{\boldsymbol{c}^TB^TAB\boldsymbol{c}}{\boldsymbol{c}^TB^TB\boldsymbol{c}}=\frac{\boldsymbol{c}^T\Lambda\boldsymbol{c}}{\boldsymbol{c}^TI\boldsymbol{c}}=\frac{\sum^n_{i=1}\lambda_i c_i^2}{\sum_{i=1}^n c^2_i}\tag{12}
$$

From equation(12), we could conclude:

$$
\lambda_{\text{min}}\leq \frac{\boldsymbol{p}^TA\boldsymbol{p}}{||\boldsymbol{p}||^2}\leq \lambda_{\text{max}}\tag{13}
$$
  1. maximum $2^{\text{nd}}$ derivative occure along with $\boldsymbol{z}_{\text{max}}$(according to $\lambda_{\text{max}}$)
  2. eigenvalues is the $2^{\text{nd}}$ derivative along its eigenvector.
  3. eigenvectors could define a new coordinate system
  4. eigenvectors are principal axes of the function contour.
  5. going along with the $\boldsymbol{z}_{\text{max}}$ direction could have the largest change in function value $|\Delta F(x)|$
  6. eigenvalues here are all positive because of positive definite.

An example:

$$
F(\boldsymbol{x})=x_1^2+x_1x_2+x_2^2
=\frac{1}{2}\boldsymbol{x}^T
\begin{bmatrix}
2&1\\1&2
\end{bmatrix}\boldsymbol{x}\tag{14}
$$

we can calculate the eigenvectors and eigenvalues:

$$
\begin{aligned}
&\lambda_1=1&&\boldsymbol{z}_1=\begin{bmatrix}
1\\-1
\end{bmatrix}\\
&\lambda_2=3&&\boldsymbol{z}_2=\begin{bmatrix}
1\\1
\end{bmatrix}
\end{aligned}\tag{15}
$$

The contour plot and 3-D plots is:


Another example:

$$
F(\boldsymbol{x})=-\frac{1}{4}x_1^2-\frac{3}{2}x_1x_2-\frac{1}{4}x_2^2
=\frac{1}{2}\boldsymbol{x}^T
\begin{bmatrix}
-0.5&-1.5\\-1.5&-0.5
\end{bmatrix}\boldsymbol{x}\tag{16}
$$

we can calculate the eigenvectors and eigenvalues:

$$
\begin{aligned}
&\lambda_1=1&&\boldsymbol{z}_1=\begin{bmatrix}
-1\\1
\end{bmatrix}\\
&\lambda_2=-2&&\boldsymbol{z}_2=\begin{bmatrix}
-1\\-1
\end{bmatrix}
\end{aligned}\tag{17}
$$

The contour plot and 3-D plots is:

Conclusion

  1. $\lambda_i>0$ or $i=1,2,\cdots$, $F(x)$ have a single strong minimum
  2. $\lambda_i<0$ or $i=1,2,\cdots$, $F(x)$ have a single strong maximum
  3. $\lambda_i$ have both negative and positive together. $F(x)$ has a saddle point
  4. $\lambda_i\geq 0$ and have a $\lambda_j=0$, $F(x)$ has a weak minimum or has no stationary point.
  5. $\lambda_i\leq 0$ and have a $\lambda_j=0$, $F(x)$ has a weak maximum or has no stationary point.

References


1 Demuth, H.B., Beale, M.H., De Jess, O. and Hagan, M.T., 2014. Neural network design. Martin Hagan.
Last modified: March 24, 2020