**Keywords:** committees

## Analysis of Committees^{1}

The committee is a native inspiration for how to combine several models(or we can say how to combine the outputs of several models). For example, we can combine all the models by:

y_{COM}(X)=\frac{1}{M}\sum_{m=1}^My_m(X)\tag{1}

$$

However, we want to analyze whether this average prediction of models is good than a single one of them.

To compare the committee and a single model, we should first build a criterion depending on which we can distinguish which model is better. Assuming that the true generator of the training data $x$ is:

h(x)\tag{2}

$$

So our prediction of the $m$th model for $m=1,2\cdots,M$ can be represented as:

y_m(x) = h(x) +\epsilon_m(x)\tag{3}

$$

then the average sum-of-square of error can be a nice criterion. And the criterion of sigle model is:

\mathbb{E}_x[(y_m(x)-h(x))^2] = \mathbb{E}_x[\epsilon_m(x)^2] \tag{4}

$$

where the $\mathbb{E}[\cdot]$ is the frequentist expectation. To make the creterion more concrete, we consider the average of the error over $M$ models:

E_{AV} = \frac{1}{M}\sum_{m=1}^M\mathbb{E}_x[\epsilon_m(x)^2]\tag{5}

$$

And on the other hand, the committees have the error given by equation(1), (3) and (4):

\begin{aligned}

E_{COM}&=\mathbb{E}_x[(\frac{1}{M}\sum_{m=1}^My_m(x)-h(x))^2] \\

&=\mathbb{E}_x[\{\frac{1}{M}\sum_{m=1}^M\epsilon_m(x)\}^2]

\end{aligned} \tag{6}

$$

Now we assume that the random variables $\epsilon_i(x)$ for $i=1,2,\cdots,M$ have **mean 0** and **uncorrelated**, so that:

\begin{aligned}

\mathbb{E}_x[\epsilon_m(x)]&=0 &\\

\mathbb{E}_x[\epsilon_m(x)\epsilon_l(x)]&=0,&m\neq l

\end{aligned} \tag{7}

$$

Then subsitute equ (7) into equ (6), we can get:

E_{COM}=\frac{1}{M^2}\mathbb{E}_x[\epsilon_m(x)]\tag{8}

$$

According the equation (5) and (8):

E_{AV}=\frac{1}{M}E_{COM}\tag{9}

$$

**All the mathematics above is based on the assumption that the error of each model is uncorrelated**. However, most time they are highly correlated and the reduction of error is generally small. But the relation:

E_{COM}\leq E_{AV}\tag{10}

$$

exists for sure. Then the boosting method was established to make the combining model more powerful.