Keywords: mixtures of Gaussians

## A Formal Introduction to Mixtures of Gaussians1

We have introduced a mixture distribution in the post ‘An Introduction to Mixture Models’. And the example in that post was just two components Gaussian Mixture. However, in this post, we would like to talk about Gaussian mixtures formally. And it severs to motivate the expectation-maximization(EM) algorithm.

Gaussian mixture distribution can be writen as:

$$p(\boldsymbol{x})= \sum_{k=1}^{K}\pi_k\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_k,\Sigma_k)\tag{1}$$

where $\sum_{k=1}^K \pi_k =1$ and $0\leq \pi_k\leq 1$.
And then we introduce a random variable(vector) called latent varible(vector) $\boldsymbol{z}$, that each component:

$$z_k\in\{0,1\}\tag{2}$$

and $\boldsymbol{z}$ is a $1$-of-$K$ representation, which means there is one and only one component is $1$ and others are $0$. To build a joint distribution $p(\boldsymbol{x},\boldsymbol{z})$, we should build $p(\boldsymbol{x}|\boldsymbol{z})$ and $p(\boldsymbol{z})$ firstly. We define the distribution of $\boldsymbol{z}$, we found:

$$p(z_k=1)=\pi_k\tag{3}$$

is a good design, for $\{\pi_k\}$ for $k=1,\cdots,K$ meets the requirements of the probability distribution. And for the entire vector $\boldsymbol{z}$ equ(3) can be written as:

$$p(\boldsymbol{z}) = \Pi_{k=1}^K \pi_k^{z_k}\tag{4}$$

And according to the definition of $p(\boldsymbol{z})$ we can get the condition distribution of $\boldsymbol{x}$ given $\boldsymbol{z}$. Under the condition $z_k=1$, we have:

$$p(\boldsymbol{x}|z_k=1)=\mathcal{N}(\boldsymbol{x}|\mu_k,\Sigma_k)\tag{5}$$

and then we can derive the vector form of condtional distribution:

$$p(\boldsymbol{x}|\boldsymbol{z})=\Pi_{k=1}^{K}\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_k,\Sigma_k)^{z_k}\tag{6}$$

Once we have both the probability distribution of $\boldsymbol{z}$, $p(\boldsymbol{z})$ and conditional distribution of $\boldsymbol{x}$ given $\boldsymbol{z}$, $p(\boldsymbol{x}|\boldsymbol{z})$. And we can build joint distribution by multiplication principle:

$$p(x,z) = p(\boldsymbol{z})\cdot p(\boldsymbol{x}|\boldsymbol{z})\tag{7}$$

However, what we concern is still the distribution of $\boldsymbol{x}$. We can calculate $\boldsymbol{x}$ by simply:

$$p(\boldsymbol{x}) = \sum_{j}p(x,z_j) = \sum_{\boldsymbol{j}}p(\boldsymbol{z_j})\cdot p(\boldsymbol{x}|\boldsymbol{z_j})\tag{8}$$

where $z_j$ is every possible value of random variable $z$
This is how latent variables construct mixture Gaussians. And this form is easy for us to analyze the distribution of a mixture model.

## ‘Responsibility’ of Gaussian Mixtures

Bayesian formula can help us produce posterior. And the posterior probability of latent varibale $\boldsymbol{z}$ by equation (7) can be calculated:

$$p(z_k=1|\boldsymbol{x})=\frac{p(z_k=1)p(\boldsymbol{x}|z_k=1)}{\sum_j^K p(z_j=1)p(\boldsymbol{x}|z_j=1)}\tag{9}$$

and substitute equation (3),(5) into equation (9) and we get:

$$p(z_k=1|\boldsymbol{x})=\frac{\pi_k\mathcal{N}(\boldsymbol{x}|\mu_k,\Sigma_k)}{\sum^K_j \pi_j\mathcal{N}(\boldsymbol{x}|\mu_j,\Sigma_j)}\tag{10}$$

And $p(z_k=1|\boldsymbol{x})$ is also called reponsibility, and denoted as:

$$\gamma(z_k)=p(z_k=1|\boldsymbol{x})\tag{11}$$

## References

1 Bishop, Christopher M. Pattern recognition and machine learning. springer, 2006.