Keywords: Bayesian model averaging, BMA and combining models

Bayesian Model Averaging(BMA)1

Bayesian model averaging(BMA) is another wildly used method which is very like a combining model. However, the difference between BMA and combining models is significant.

A Bayesian model averaging is a Bayesian formula in which the random variable are models(hypothesises) $h=1,2,\cdots,H$ with prior probability $p(h)$, then the marginal distribution over data $X$ is:

$$
p(X)=\sum_{h=1}^{H}p(X|h)p(h)
$$

And the MBA is used to select a model(hypothesis) that can model the data best through Bayesian theory. When we have a larger size of $X$, the posterior probability

$$
p(h|X)=\frac{p(X|h)p(h)}{\sum_{i=1}^{H}p(X|i)p(i)}
$$

become sharper. Then we got a good hypothesis.

Mixture of Gaussian(Combining Models)

In post ‘Mixtures of Gaussians’, we have seen how a mixture of Gaussians works. Then joint distribution of input data $\boldsymbol{x}$ and latent varible $\boldsymbol{z}$ is:

$$
p(\boldsymbol{x},\boldsymbol{z})
$$

and the margin distribution of $\boldsymbol{x}$ is

$$
p(\boldsymbol{x})=\sum_{\boldsymbol{z}}p(\boldsymbol{x},\boldsymbol{z})
$$

For the mixture of Gaussians:

$$
p(\boldsymbol{x})=\sum_{k=1}^{K}\pi_k\mathcal{N}(\boldsymbol{x}|\boldsymbol{\mu}_k,\Sigma_k)
$$

the latent variable $\boldsymbol{z}$ is designed:

$$
p(z_k) = \pi_k
$$

for $k=\{1,2,\cdots,K\}$. And $z_k\in\{0,1\}$ is a $1$-of-$K$ representation.

Then this mixture of Gaussians is a king of combining models. Each time, only one $k$ is selected(for $\boldsymbol{z}$ is $1$-of-$K$ representation). An example of a mixture of Gaussians, and its original curve is like:

And the latent variables $\boldsymbol{z}$ separate the whole distribution into several Gaussian distributions:

This is the simplest model of combining models where each expert is a Gaussian model. And during the voting, only one model selected by $\boldsymbol{z}$ makes the final decision.

Distinction

A combining model method contains several models and predicts by voting or other rules. However, Bayesian model averaging can be used to generate a hypothesis from several candidates.

References


1 Bishop, Christopher M. Pattern recognition and machine learning. springer, 2006.
Last modified: March 24, 2020