In this article the conditional Variance and predictions using conditional expectation for the different kind of random variable with some examples we will discuss.

**Table of Content**

**Conditional Variance**

The conditional variance of random variable X given Y is defined in similar way as conditional Expectation of random variable X given Y as

(X|Y)=E[(X-E[X|Y])^{2}|Y]

[latex]Var(X|Y)=E[(X-E[X|Y])^{2}|Y][/latex]

here variance is the conditional expectation of difference between random variable and square of conditional expectation of X given Y when the value of Y is given.

The relation between the conditional variance and conditional expectation is

(X|Y) = E[X^{2}|Y] – (E[X|Y])^{2}

E[(X|Y)] = E[E[X^{2}|Y]] – E[(E[X|Y])^{2}]

= E[X^{2}] – E[(E[X\Y])^{2}]

since E[E[X|Y]] = E[X], we have

(E[X|Y]) = E[(E[X|Y])^{2}] – (E[X])^{2}

[latex]\operatorname{Var}(X \mid Y)=E\left[X^{2} \mid Y\right]-(E[X \mid Y])^{2}

\\\begin{aligned} E[\operatorname{Var}(X \mid Y)] &=E\left[E\left[X^{2} \mid Y\right]\right]-E\left[(E[X \mid Y])^{2}\right] \\ &=E\left[X^{2}\right]-E\left[(E[X \mid Y])^{2}\right] \end{aligned}

\\since \; E[E[X \mid Y]]=E[X], \;we\; have

\\\operatorname{Var}(E[X \mid Y])=E\left[(E[X \mid Y])^{2}\right]-(E[X])^{2}[/latex]

this is somehow similar from the relation of unconditional variance and expectation which was

X = E[X^{2}] – (E[X])^{2}

[latex]\operatorname{Var}(X)=E\left[X^{2}\right]-(E[X])^{2}[/latex]

and we can find the variance with the help of conditional variance as

X = E[(X|Y] + (E[X|Y])

[latex]\operatorname{Var}(X)=E[\operatorname{Var}(X \mid Y)]+\operatorname{Var}(E[X \mid Y])[/latex]

**Example of conditional variance**

Find the mean and variance of the number of travelers who enters into the bus if the people arrived at bus depot is Poisson distributed with mean λt and the initial bus arrived at bus depot is uniformly distributed over the interval (0,T) independent of people arrived or not.

Solution:

To find the mean and variance let for any time t , Y is the random variable for the time bus arrive and N(t) is the number of arrivals

E[N(Y)|Y = t] = E[N(t)|Y = t]

[latex]E[N(Y) \mid Y=t]=E[N(t) \mid Y=t]

\\=E[N(t)][/latex]

by the independence of Y and N(t)

=**λ**t

[latex]=\lambda t[/latex]

since N(t) is Poisson with mean `\lambda t`

Hence

E[N(Y)|Y]=**λ**Y

[latex]E[N(Y) \mid Y]=\lambda Y[/latex]

so taking expectations gives

E[N(Y)] = **λ**E[Y] = **λ**T/2

[latex]E[N(Y)]=\lambda E[Y]=\frac{\lambda T}{2}[/latex]

To obtain Var(N(Y)), we use the conditional variance formula

[latex]\operatorname{Var}(N(Y) \mid Y=t)=\operatorname{Var}(N(t) \mid Y=t)

\\=\operatorname{Var}(N(t)) \quadby \quad independence \quad\quad

\\=\lambda t[/latex]

thus

(N(Y)|Y) = **λ**Y

E[N(Y)|Y] = **λ**Y

[latex]\begin{aligned}

\operatorname{Var}(N(Y) \mid Y) &=\lambda Y \\

E[N(Y) \mid Y] &=\lambda Y

\end{aligned}[/latex]

Hence, from the conditional variance formula,

Var(N(Y)) = E[**λ**Y]+(**λ**Y)

=**λ**T/^{2 }+ **λ**^{2}T^{2}/12

[latex]\begin{aligned}

\operatorname{Var}(N(Y)) &=E[\lambda Y]+\operatorname{Var}(\lambda Y) \\

&=\lambda \frac{T}{2}+\lambda^{2} \frac{T^{2}}{12}

\end{aligned}[/latex]

where we have used the fact that Var(Y)=T^{2} / 12.

**Variance of a sum of a random number of random variables**

consider the sequence of independent and identically distributed random variables X_{1},X_{2},X_{3},………. and another random variable N independent of this sequence, we will find variance of sum of this sequence as

[latex]\operatorname{Var}\left(\sum_{i=1}^{N} X_{i}\right)[/latex]

using

[latex]\begin{aligned}

E\left[\sum_{i=1}^{N} X_{i} \mid N\right] &=N E[X] \\

\operatorname{Var}\left(\sum_{i=1}^{N} X_{i} \mid N\right) &=N \operatorname{Var}(X)\right]

\end{aligned}

[/latex]

which is obvious with the definition of variance and conditional variance for the individual random variable to the sum of sequence of random variables hence

[latex] \\

\operatorname{Var}\left(\sum_{i=1}^{N} X_{i}\right)=E[N] \operatorname{Var}(X)+(E[X])^{2} \operatorname{Var}(N) [/latex]

**Prediction **

In prediction the value of one random variable can be predicted on the basis of observation of another random variable, for prediction of random variable Y if observed random variable is X we use g(X) as the function which tells the predicted value, obviously we try to choose g(X) closed to Y for this the best g is g(X)=E(Y|X) for this we must have to minimize the value of g by using the inequality

[latex]\\

E\left[(Y-g(X))^{2}\right] \geq E\left[(Y-E[Y \mid X])^{2}[/latex]

This inequality we can get as

[latex]\begin{aligned}

E\left[(Y-g(X))^{2} \mid X\right]=& E\left[(Y-E[Y \mid X]+E[Y \mid X]-g(X))^{2} \mid X\right] \\

=& E\left[(Y-E[Y \mid X])^{2} \mid X\right] \\

&+E\left[(E[Y \mid X]-g(X))^{2} \mid X\right] \\

&+2 E[(Y-E[Y \mid X])(E[Y \mid X]-g(X)) \mid X]

\end{aligned}[/latex]

However, given X, E[Y|X]-g(X), being a function of X, can be treated as a constant. Thus,

[latex]\

\begin{aligned}

E[&(Y-E[Y \mid X])(E[Y \mid X]-g(X)) \mid X] \\

&=(E[Y \mid X]-g(X)) E[Y-E[Y \mid X] \mid X] \\

&=(E[Y \mid X]-g(X))(E[Y \mid X]-E[Y \mid X]) \\

&=0

\end{aligned}[/latex]

which gives the required inequality

[latex]\

E\left[(Y-g(X))^{2}\right] \geq E\left[(Y-E[Y \mid X])^{2}[/latex]

a

**Examples on Prediction**

1. It is observed that the height of a person is six feet, what would be the prediction of his sons height after grown up if the height of son which is x inches now is normally distributed with mean x+1 and variance 4.

Solution: let X be the random variable denoting the height of the person and Y be the random variable for the height of son, then the random variable Y is

[latex]

Y=X+e+1[/latex]

here e represent the normal random variable independent of random variable X with mean zero and variance four.

so the prediction for the sons height is

[latex]

E[Y \mid X=72]= E[X+1+e \mid X=72] \\

= 73+E[e \mid X=72]

\\=73+E(e) \quad by \quad independence \\=73[/latex]

so the height of the son will be 73 inches after growth.

2. Consider an example of sending signals from location A and location B, if from location A a signal value s is sent which at location B received by normal distribution with mean s and variance 1 while if the signal S sent at A is normally distributed with mean \mu and variance \sigma^2, how we can predict that the signal value R sent from location A will be received is r at location B?

Solution: The signal values S and R denote here the random variables distributed normally, first we find the conditional density function S given R as

[latex]\

\begin{aligned}

f_{S \mid R}(s \mid r)&=\frac{f_{S, R}(s, r)}{f_{R}(r)} \\

&=\frac{f_{S}(s) f_{R \mid S}(r \mid s)}{f_{R}(r)} \\

&=K e^{-(s-\mu)^{2} / 2 \sigma^{2}} e^{-(r-s)^{2} / 2}

\end{aligned}[/latex]

this K is independent of S, now

[latex]\

\begin{aligned}

\frac{(s-\mu)^{2}}{2 \sigma^{2}}+\frac{(r-s)^{2}}{2}&=s^{2}\left(\frac{1}{2 \sigma^{2}}+\frac{1}{2}\right)-\left(\frac{\mu}{\sigma^{2}}+r\right) s+C_{1}\\

&=\frac{1+\sigma^{2}}{2 \sigma^{2}}\left[s^{2}-2\left(\frac{\mu+r \sigma^{2}}{1+\sigma^{2}}\right) s\right]+C_{1} \\

&=\frac{1+\sigma^{2}}{2 \sigma^{2}}\left(s-\frac{\left(\mu+r \sigma^{2}\right)}{1+\sigma^{2}}\right)^{2}+C_{2}

\end{aligned}[/latex]

here also C_{1} and C_{2} are independent on S, so the value of conditional density function is

[latex]\

f_S \mid R(s \mid r)=C e^{ \left\{\frac{-\left[s-\frac{\left(\mu+r \sigma^{2}\right)}{1+\sigma^{2}}\right]^{2}}{2\left(\frac{\sigma^{2}}{1+\sigma^{2}}\right)}\right\}}}[/latex]

C is also independent on s, Thus the signal sent from location A as R and received at location B as r is normal with mean and variance

[latex]

\begin{array}{l}E[S \mid R=r]=\frac{\mu+r \sigma^{2}}{1+\sigma^{2}} \\ \operatorname{Var}(S \mid R=r)=\frac{\sigma^{2}}{1+\sigma^{2}}\end{array}

[/latex]

and the mean square error for this situation is

[latex]E[S \mid R=r]=\frac{1}{1+\sigma^{2}} \mu+\frac{\sigma^{2}}{1+\sigma^{2}} r

[/latex]

**Linear Predictor**

Every time we can not find the joint probability density function even the mean, variance and the correlation between two random variables is known, in such a situation linear predictor of one random variable with respect to another random variable is very helpful which can predict the minimum, so the for the linear predictor of random variable Y with respect to random variable X we take a and b to minimize

[latex]\begin{aligned}

E\left[(Y-(a+b X))^{2}\right]=& E\left[Y^{2}-2 a Y-2 b X Y+a^{2}+2 a b X+b^{2} X^{2}\right] \\

=& E\left[Y^{2}\right]-2 a E[Y]-2 b E[X Y]+a^{2} +2 a b E[X]+b^{2} E\left[X^{2}\right]

\end{aligned}[/latex]

Now differentiate partially with respect to a and b we will get

[latex]\begin{aligned}

\frac{\partial}{\partial a} E\left[(Y-a-b X)^{2}\right]&=-2 E[Y]+2 a+2 b E[X] \\

\frac{\partial}{\partial b} E\left[(Y-a-b X)^{2}\right]&=-2 E[X Y]+2 a E[X]+2 b E\left[X^{2}\right] \\

\end{aligned}[/latex]

solving these two equations for a nd b we will get

[latex]\begin{aligned}

b&=\frac{E[X Y]-E[X] E[Y]}{E\left[X^{2}\right]-(E[X])^{2}}=\frac{\operatorname{Cov}(X, Y)}{\sigma_{x}^{2}}=\rho \frac{\sigma_{y}}{\sigma_{x}} \\

a&=E[Y]-b E[X]=E[Y]-\frac{\rho \sigma_{y} E[X]}{\sigma_{x}}

\end{aligned}[/latex]

thus minimizing this expectation gives the linear predictor as

[latex]\mu_{y}+\frac{\rho \sigma_{y}}{\sigma_{x}}\left(X-\mu_{x}\right)[/latex]

where the means are the respective means of random variables X and Y, the error for the linear predictor will be obtained with the expectation of

[latex]\begin{array}{l}

E\left[\left(Y-\mu_{y}-\rho \frac{\sigma_{y}}{\sigma_{x}}\left(X-\mu_{x}\right)\right)^{2}\right] \\

\quad=E\left[\left(Y-\mu_{y}\right)^{2}\right]+\rho^{2} \frac{\sigma_{y}^{2}}{\sigma_{x}^{2}} E\left[\left(X-\mu_{x}\right)^{2}\right]-2 \rho \frac{\sigma_{y}}{\sigma_{x}} E\left[\left(Y-\mu_{y}\right)\left(X-\mu_{x}\right)\right] \\

\quad=\sigma_{y}^{2}+\rho^{2} \sigma_{y}^{2}-2 \rho^{2} \sigma_{y}^{2} \\

\quad=\sigma_{y}^{2}\left(1-\rho^{2}\right)

\end{array}[/latex]

This error will be nearer to zero if correlation is perfectly positive or perfectly negative that is coefficient of correlation is either +1 or -1.

**Conclusion**

The conditional variance for the discrete and continuous random variable with different examples were discussed, one of the important application of conditional expectation in prediction is also explained with suitable examples and with best linear predictor, if you require further reading go through below links.

For more post on Mathematics, please refer to our Mathematics Page

A first course in probability by Sheldon Ross

Schaum’s Outlines of Probability and Statistics

An introduction to probability and statistics by ROHATGI and SALEH