Law of Large Numbers and CLT
law of large numbers, central limit theorem, sample mean, standard error, asymptotics
1 Role
This page explains the two classical limit laws that make probability useful for data and repeated experiments.
The law of large numbers says averages become stable. The central limit theorem says how the remaining fluctuations around that stable value behave.
2 First-Pass Promise
Read this page after Joint, Conditional, and Bayes.
If you stop here, you should still understand:
- what the sample mean is trying to estimate
- what the law of large numbers says in plain language
- what the central limit theorem adds beyond the law of large numbers
- why the standard error scales like
1 / sqrt(n)
3 Why It Matters
These ideas matter because much of statistics and empirical science depends on repeated sampling:
- averages of noisy data should stabilize
- empirical frequencies should approach population frequencies
- uncertainty in sample estimates should shrink with more data
- many approximate confidence intervals and hypothesis tests rely on normal behavior at scale
In ML and engineering, the same logic appears when we average losses, estimate risks, approximate expectations by simulation, or study the noise in stochastic algorithms.
4 Prerequisite Recall
if \(X_1,\dots,X_n\) are random variables, the sample mean is
\[ \bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i \]
expectation gives the target center
variance measures the typical size of fluctuations
5 Intuition
The law of large numbers and the central limit theorem answer two different questions.
The first question is:
If I average more and more independent samples, do I get close to the truth?
That is the law of large numbers.
The second question is:
How far from the truth should I still expect to be when n is large but finite?
That is where the central limit theorem enters.
So the clean distinction is:
LLNis about convergence of the average itselfCLTis about the shape of the scaled error around the target
The law of large numbers says the average settles down. The central limit theorem says the leftover noise often looks approximately Gaussian once you scale it correctly.
6 Formal Core
Definition 1 (Definition) If \(X_1,\dots,X_n\) are random variables, their sample mean is
\[ \bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i. \]
When the \(X_i\) are i.i.d. with mean \(\mu\), the sample mean is a natural estimator of \(\mu\).
Theorem 1 (Weak Law of Large Numbers) If \(X_1,X_2,\dots\) are i.i.d. with finite mean \(\mu\), then
\[ \bar{X}_n \xrightarrow{P} \mu \]
as \(n \to \infty\).
Equivalently, for every \(\varepsilon > 0\),
\[ P\big(|\bar{X}_n-\mu|>\varepsilon\big)\to 0. \]
In words: the sample average converges in probability to the population mean.
There are stronger versions of the law of large numbers, but this page only needs the basic stabilizing idea.
Theorem 2 (Central Limit Theorem) If \(X_1,X_2,\dots\) are i.i.d. with mean \(\mu\) and finite variance \(\sigma^2\), then
\[ \frac{\sqrt{n}(\bar{X}_n - \mu)}{\sigma} \overset{d}{\longrightarrow} N(0,1) \]
as \(n \to \infty\).
Equivalently, for large \(n\),
\[ \bar{X}_n \approx N\left(\mu,\frac{\sigma^2}{n}\right). \]
Proposition 1 (Key Distinction) The law of large numbers and the central limit theorem do not say the same thing.
LLN: the average gets close to \(\mu\)CLT: the error \(\bar{X}_n - \mu\) is typically of size about \(\sigma/\sqrt{n}\) and, after scaling, looks approximately normal
So LLN is about consistency, while CLT is about fluctuation shape and scale.
7 Worked Example
Let \(X_i\) be the indicator that the \(i\)th coin flip is heads:
\[ X_i = \begin{cases} 1, & \text{if flip } i \text{ is heads} \\ 0, & \text{if flip } i \text{ is tails.} \end{cases} \]
Assume the coin is fair, so
\[ P(X_i=1)=P(X_i=0)=\frac{1}{2}. \]
Then
\[ \mu = E[X_i] = \frac{1}{2}, \qquad \sigma^2 = \operatorname{Var}(X_i)=\frac{1}{4}. \]
The sample mean
\[ \bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i \]
is just the proportion of heads in \(n\) flips.
The law of large numbers says:
\[ \bar{X}_n \to \frac{1}{2}. \]
So with many flips, the observed proportion of heads should settle near 0.5.
The central limit theorem says that for large \(n\),
\[ \bar{X}_n \approx N\left(\frac{1}{2}, \frac{1}{4n}\right). \]
So the standard error is
\[ \sqrt{\frac{1}{4n}} = \frac{1}{2\sqrt{n}}. \]
For example, if \(n=400\), then the standard error is
\[ \frac{1}{2\sqrt{400}}=\frac{1}{40}=0.025. \]
That means a rough 95% normal-style fluctuation band is
\[ 0.5 \pm 2(0.025), \]
which is about
\[ [0.45,\;0.55]. \]
This example shows the difference clearly:
LLNsays the proportion of heads approaches0.5CLTsays the remaining error is usually about the size1 / sqrt(n)- the Gaussian approximation explains why the fluctuations become predictable
8 Computation Lens
These theorems show up constantly in practical calculations:
- compute a sample average
- identify its target mean \(\mu\)
- estimate the variance or standard deviation
- use the standard error \(\sigma/\sqrt{n}\) to understand uncertainty
The key scaling rule to remember is:
averaging n independent samples reduces the fluctuation scale by sqrt(n)
That is why four times as much data cuts the standard error roughly in half rather than by a factor of four.
9 Application Lens
This page sits underneath many common workflows:
- Monte Carlo estimation, where averages approximate expectations
- polling and A/B testing, where proportions stabilize with larger samples
- empirical risk estimation in ML, where average loss estimates population loss
- stochastic optimization, where minibatch averages reduce noise but do not remove it completely
So LLN and CLT are part of the hidden logic behind “more data gives a better estimate” and “uncertainty shrinks like 1/sqrt(n).”
10 Stop Here For First Pass
If you can now explain:
- what the sample mean is estimating
- what the law of large numbers says
- what the CLT adds beyond the law of large numbers
- why standard error shrinks at the rate
1 / sqrt(n)
then this page has done its main job.
11 Go Deeper
The best next move is Concentration and Common Inequalities, where probability starts giving explicit finite-sample bounds rather than asymptotic laws.
12 Optional Paper Bridge
- MIT RES.6-012: The Central Limit Theorem -
Second pass- official MIT material emphasizing the universality of the Gaussian approximation. Checked2026-04-24. - Penn State STAT 414: The Central Limit Theorem -
Paper bridge- helpful for seeing how CLT statements are used in concrete data-analysis questions. Checked2026-04-24.
13 Optional After First Pass
If you want more practice before moving on:
- compare the exact Binomial distribution with the normal approximation for growing \(n\)
- check how standard error changes when sample size multiplies by
4or9 - contrast LLN-style convergence with finite-sample concentration bounds
14 Common Mistakes
- thinking LLN says the sample mean is exactly equal to the population mean for large \(n\)
- thinking CLT says the original data become normal
- forgetting that the CLT is about the sample mean or scaled sum, not every random quantity
- expecting error to shrink like
1/ninstead of1/sqrt(n) - ignoring assumptions such as independence or finite variance
15 Exercises
- Let \(X_i\) be i.i.d. with mean \(10\) and variance \(9\). What are the mean and variance of \(\bar{X}_{100}\)?
- Explain in words the difference between “the average converges to the truth” and “the scaled error is approximately normal.”
- If the standard error of an average is
0.08at sample sizen, what happens approximately when the sample size is increased to4n?
16 Sources and Further Reading
- Harvard Stat 110 -
First pass- strong official course hub covering limit theorems in a broad probability sequence. Checked2026-04-24. - Penn State STAT 414 -
First pass- current official notes with explicit coverage of CLT and probability-theory foundations. Checked2026-04-24. - MIT RES.6-012 Introduction to Probability -
Second pass- official MIT course materials with a clean probability-theory perspective on LLN and CLT. Checked2026-04-24. - MIT RES.6-012 LLN and CLT lecture materials -
Second pass- a direct official bridge that contrasts LLN stabilization with CLT fluctuation detail. Checked2026-04-24. - Penn State STAT 414: The Central Limit Theorem -
Paper bridge- a good bridge from formal theorem statements to applied interpretation. Checked2026-04-24.
Sources checked online on 2026-04-24:
- Harvard Stat 110 course overview
- Penn State STAT 414 overview
- Penn State CLT lesson
- MIT RES.6-012 CLT resource page
- MIT RES.6-012 LLN and CLT lecture materials