Expectation, Variance, Covariance
expectation, variance, covariance, linearity of expectation, second moment
1 Role
This page explains the three summary quantities that appear everywhere in probability.
Expectation tells you the center, variance tells you the spread, and covariance tells you how two random quantities move together.
2 First-Pass Promise
Read this page after Random Variables and Distributions.
If you stop here, you should still understand:
- what expectation is averaging
- why variance measures fluctuation around a center
- what covariance means geometrically and statistically
- why these quantities keep reappearing in later theory
3 Why It Matters
Once a random variable has a distribution, the next question is usually not “what are all its probabilities?” but rather:
- where is it centered?
- how much does it fluctuate?
- how is it related to another random variable?
Those are expectation, variance, and covariance questions.
They sit underneath:
- estimator accuracy
- standard error calculations
- concentration inequalities
- Gaussian models
- regression geometry
- covariance matrices and PCA
So this page is where probability starts becoming a toolbox instead of a vocabulary list.
4 Prerequisite Recall
- a random variable is a function from outcomes to numbers
- a distribution tells you how likely different values are
- a PMF or PDF lets you average functions of a random variable
5 Intuition
Expectation is the probabilistic analog of a center of mass.
Variance asks how far the random variable typically sits from that center. It is a second-moment measure, so large deviations matter more than small ones.
Covariance compares two centered random variables:
\[ X - E[X] \qquad \text{and} \qquad Y - E[Y]. \]
If they tend to be positive and negative together, covariance is positive. If one tends to be above its mean when the other is below, covariance is negative. If there is no systematic linear co-movement, covariance may be zero.
The important caution is that zero covariance is weaker than independence.
6 Formal Core
Definition 1 (Definition) For a discrete random variable \(X\) with PMF \(p_X\),
\[ E[X] = \sum_x x\,p_X(x). \]
For a continuous random variable with density \(f_X\),
\[ E[X] = \int_{-\infty}^{\infty} x\,f_X(x)\,dx, \]
provided the integral exists.
Definition 2 (Variance) The variance of \(X\) is
\[ \operatorname{Var}(X) = E\big[(X-E[X])^2\big]. \]
A useful identity is
\[ \operatorname{Var}(X)=E[X^2] - (E[X])^2. \]
Definition 3 (Covariance) For random variables \(X\) and \(Y\),
\[ \operatorname{Cov}(X,Y) = E\big[(X-E[X])(Y-E[Y])\big]. \]
An equivalent formula is
\[ \operatorname{Cov}(X,Y)=E[XY]-E[X]E[Y]. \]
Proposition 1 (Key Statements) Expectation is linear:
\[ E[aX+bY+c] = aE[X] + bE[Y] + c. \]
If \(X\) and \(Y\) are independent, then
\[ \operatorname{Cov}(X,Y)=0. \]
The converse is false in general: zero covariance does not guarantee independence.
7 Worked Example
Flip a fair coin twice.
Let
- \(Y\) be the indicator that the first flip is heads
- \(X\) be the total number of heads in the two flips
So the outcomes and values are:
\[ \begin{array}{c|cc} \text{outcome} & X & Y \\ \hline HH & 2 & 1 \\ HT & 1 & 1 \\ TH & 1 & 0 \\ TT & 0 & 0 \end{array} \]
First compute the expectations.
For \(X\),
\[ E[X] = 2\cdot \frac14 + 1\cdot \frac14 + 1\cdot \frac14 + 0\cdot \frac14 = 1. \]
For \(Y\),
\[ E[Y] = 1\cdot \frac12 + 0\cdot \frac12 = \frac12. \]
Next compute the variance of \(X\).
We have
\[ E[X^2] = 4\cdot \frac14 + 1\cdot \frac14 + 1\cdot \frac14 = \frac32, \]
so
\[ \operatorname{Var}(X) = E[X^2] - (E[X])^2 = \frac32 - 1 = \frac12. \]
Now compute the covariance.
Since
\[ XY = \begin{cases} 2, & HH \\ 1, & HT \\ 0, & TH \\ 0, & TT, \end{cases} \]
we get
\[ E[XY] = 2\cdot \frac14 + 1\cdot \frac14 = \frac34. \]
Therefore
\[ \operatorname{Cov}(X,Y) = E[XY] - E[X]E[Y] = \frac34 - 1\cdot \frac12 = \frac14. \]
This is positive because knowing the first flip is heads pushes the total number of heads upward.
The example shows the three roles clearly:
- expectation gives the center
- variance measures spread around that center
- covariance detects linear dependence between two random quantities
8 Computation Lens
The fastest recurring moves are:
- use linearity of expectation instead of expanding full distributions when possible
- compute variance through \(E[X^2] - (E[X])^2\)
- compute covariance through \(E[XY]-E[X]E[Y]\)
These shortcuts matter because later problems often involve sums, averages, indicators, or matrix-valued random objects. The formulas above scale better than starting from scratch every time.
9 Application Lens
This topic is a hidden backbone of modern applied math:
- in statistics, standard errors come from variance
- in regression, covariance structure controls estimation quality
- in PCA, the covariance matrix stores the directions of variation
- in learning theory and optimization, variance explains noise in stochastic gradients and empirical averages
So these are not decorative summaries. They are the quantities that later theorems usually bound, estimate, or transform.
10 Stop Here For First Pass
If you can now explain:
- what expectation averages over
- why variance is about fluctuation around the mean
- how covariance is different from full independence
- how to compute all three in a small example
then this page has done its main job.
11 Go Deeper
The next page is Joint, Conditional, and Bayes, where covariance and dependence start being studied through full joint distributions and conditional laws.
12 Optional Paper Bridge
- Penn State STAT 414 -
Second pass- official notes with a clean progression from expectation to covariance and correlation. Checked2026-04-24. - Penn State STAT 414 Lesson 18 -
Paper bridge- a focused official lesson showing covariance formulas, interpretation, and common misconceptions. Checked2026-04-24.
13 Optional After First Pass
If you want more practice before moving on:
- compute the expectation of a sum using linearity only
- build two dependent random variables with zero covariance
- write down a small covariance table for a two-variable discrete model
14 Common Mistakes
- confusing expectation with the most likely value
- forgetting to square the centered quantity in the variance
- thinking covariance is already a scale-free measure
- assuming zero covariance implies independence
- forgetting which distribution the expectation is taken over
15 Exercises
- Let \(X\) be the result of a fair die roll. Compute \(E[X]\) and \(\operatorname{Var}(X)\).
- Suppose \(Y\) is the indicator that a fair die roll is even. Compute \(E[Y]\) and \(\operatorname{Cov}(X,Y)\).
- Explain in words why independence implies zero covariance, but zero covariance need not imply independence.
16 Sources and Further Reading
- Harvard Stat 110 -
First pass- strong official course hub with consistently good examples for expectation, covariance, and dependence. Checked2026-04-24. - Penn State STAT 414 -
First pass- official open notes that cover expectation and covariance with direct worked examples. Checked2026-04-24. - Penn State STAT 414 Lesson 18 -
Second pass- direct official lesson on covariance and correlation. Checked2026-04-24. - MIT RES.6-012 lecture notes -
Second pass- official MIT notes with a theory-first treatment of moments, covariance, and derived distributions. Checked2026-04-24.
Sources checked online on 2026-04-24:
- Harvard Stat 110 course homepage
- Penn State STAT 414 overview
- Penn State Lesson 18 on covariance
- MIT RES.6-012 lecture notes page