Law of Large Numbers and CLT

Why averages stabilize under repeated sampling and why their fluctuations are often approximately normal at scale.

Modified

April 26, 2026

Keywords

law of large numbers, central limit theorem, sample mean, standard error, asymptotics

1 Role

This page explains the two classical limit laws that make probability useful for data and repeated experiments.

The law of large numbers says averages become stable. The central limit theorem says how the remaining fluctuations around that stable value behave.

2 First-Pass Promise

Read this page after Joint, Conditional, and Bayes.

If you stop here, you should still understand:

what the sample mean is trying to estimate
what the law of large numbers says in plain language
what the central limit theorem adds beyond the law of large numbers
why the standard error scales like 1 / sqrt(n)

3 Why It Matters

These ideas matter because much of statistics and empirical science depends on repeated sampling:

averages of noisy data should stabilize
empirical frequencies should approach population frequencies
uncertainty in sample estimates should shrink with more data
many approximate confidence intervals and hypothesis tests rely on normal behavior at scale

In ML and engineering, the same logic appears when we average losses, estimate risks, approximate expectations by simulation, or study the noise in stochastic algorithms.

4 Prerequisite Recall

if \(X_1,\dots,X_n\) are random variables, the sample mean is

\[ \bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i \]
expectation gives the target center
variance measures the typical size of fluctuations

5 Intuition

The law of large numbers and the central limit theorem answer two different questions.

The first question is:

If I average more and more independent samples, do I get close to the truth?

That is the law of large numbers.

The second question is:

How far from the truth should I still expect to be when n is large but finite?

That is where the central limit theorem enters.

So the clean distinction is:

LLN is about convergence of the average itself
CLT is about the shape of the scaled error around the target

The law of large numbers says the average settles down. The central limit theorem says the leftover noise often looks approximately Gaussian once you scale it correctly.

6 Formal Core

Definition 1 (Definition) If \(X_1,\dots,X_n\) are random variables, their sample mean is

\[ \bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i. \]

When the \(X_i\) are i.i.d. with mean \(\mu\), the sample mean is a natural estimator of \(\mu\).

Theorem 1 (Weak Law of Large Numbers) If \(X_1,X_2,\dots\) are i.i.d. with finite mean \(\mu\), then

\[ \bar{X}_n \xrightarrow{P} \mu \]

as \(n \to \infty\).

Equivalently, for every \(\varepsilon > 0\),

\[ P\big(|\bar{X}_n-\mu|>\varepsilon\big)\to 0. \]

In words: the sample average converges in probability to the population mean.

There are stronger versions of the law of large numbers, but this page only needs the basic stabilizing idea.

Theorem 2 (Central Limit Theorem) If \(X_1,X_2,\dots\) are i.i.d. with mean \(\mu\) and finite variance \(\sigma^2\), then

\[ \frac{\sqrt{n}(\bar{X}_n - \mu)}{\sigma} \overset{d}{\longrightarrow} N(0,1) \]

as \(n \to \infty\).

Equivalently, for large \(n\),

\[ \bar{X}_n \approx N\left(\mu,\frac{\sigma^2}{n}\right). \]

Proposition 1 (Key Distinction) The law of large numbers and the central limit theorem do not say the same thing.

LLN: the average gets close to \(\mu\)
CLT: the error \(\bar{X}_n - \mu\) is typically of size about \(\sigma/\sqrt{n}\) and, after scaling, looks approximately normal

So LLN is about consistency, while CLT is about fluctuation shape and scale.

7 Worked Example

Let \(X_i\) be the indicator that the \(i\)th coin flip is heads:

\[ X_i = \begin{cases} 1, & \text{if flip } i \text{ is heads} \\ 0, & \text{if flip } i \text{ is tails.} \end{cases} \]

Assume the coin is fair, so

\[ P(X_i=1)=P(X_i=0)=\frac{1}{2}. \]

Then

\[ \mu = E[X_i] = \frac{1}{2}, \qquad \sigma^2 = \operatorname{Var}(X_i)=\frac{1}{4}. \]

The sample mean

\[ \bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i \]

is just the proportion of heads in \(n\) flips.

The law of large numbers says:

\[ \bar{X}_n \to \frac{1}{2}. \]

So with many flips, the observed proportion of heads should settle near 0.5.

The central limit theorem says that for large \(n\),

\[ \bar{X}_n \approx N\left(\frac{1}{2}, \frac{1}{4n}\right). \]

So the standard error is

\[ \sqrt{\frac{1}{4n}} = \frac{1}{2\sqrt{n}}. \]

For example, if \(n=400\), then the standard error is

\[ \frac{1}{2\sqrt{400}}=\frac{1}{40}=0.025. \]

That means a rough 95% normal-style fluctuation band is

\[ 0.5 \pm 2(0.025), \]

which is about

\[ [0.45,\;0.55]. \]

This example shows the difference clearly:

LLN says the proportion of heads approaches 0.5
CLT says the remaining error is usually about the size 1 / sqrt(n)
the Gaussian approximation explains why the fluctuations become predictable

8 Computation Lens

These theorems show up constantly in practical calculations:

compute a sample average
identify its target mean \(\mu\)
estimate the variance or standard deviation
use the standard error \(\sigma/\sqrt{n}\) to understand uncertainty

The key scaling rule to remember is:

averaging n independent samples reduces the fluctuation scale by sqrt(n)

That is why four times as much data cuts the standard error roughly in half rather than by a factor of four.

9 Application Lens

This page sits underneath many common workflows:

Monte Carlo estimation, where averages approximate expectations
polling and A/B testing, where proportions stabilize with larger samples
empirical risk estimation in ML, where average loss estimates population loss
stochastic optimization, where minibatch averages reduce noise but do not remove it completely

So LLN and CLT are part of the hidden logic behind “more data gives a better estimate” and “uncertainty shrinks like 1/sqrt(n).”

10 Stop Here For First Pass

If you can now explain:

what the sample mean is estimating
what the law of large numbers says
what the CLT adds beyond the law of large numbers
why standard error shrinks at the rate 1 / sqrt(n)

then this page has done its main job.

11 Go Deeper

The best next move is Concentration and Common Inequalities, where probability starts giving explicit finite-sample bounds rather than asymptotic laws.

12 Optional Paper Bridge

MIT RES.6-012: The Central Limit Theorem - Second pass - official MIT material emphasizing the universality of the Gaussian approximation. Checked 2026-04-24.
Penn State STAT 414: The Central Limit Theorem - Paper bridge - helpful for seeing how CLT statements are used in concrete data-analysis questions. Checked 2026-04-24.

13 Optional After First Pass

If you want more practice before moving on:

compare the exact Binomial distribution with the normal approximation for growing \(n\)
check how standard error changes when sample size multiplies by 4 or 9
contrast LLN-style convergence with finite-sample concentration bounds

14 Common Mistakes

thinking LLN says the sample mean is exactly equal to the population mean for large \(n\)
thinking CLT says the original data become normal
forgetting that the CLT is about the sample mean or scaled sum, not every random quantity
expecting error to shrink like 1/n instead of 1/sqrt(n)
ignoring assumptions such as independence or finite variance

15 Exercises

Let \(X_i\) be i.i.d. with mean \(10\) and variance \(9\). What are the mean and variance of \(\bar{X}_{100}\)?
Explain in words the difference between “the average converges to the truth” and “the scaled error is approximately normal.”
If the standard error of an average is 0.08 at sample size n, what happens approximately when the sample size is increased to 4n?

16 Sources and Further Reading

Harvard Stat 110 - First pass - strong official course hub covering limit theorems in a broad probability sequence. Checked 2026-04-24.
Penn State STAT 414 - First pass - current official notes with explicit coverage of CLT and probability-theory foundations. Checked 2026-04-24.
MIT RES.6-012 Introduction to Probability - Second pass - official MIT course materials with a clean probability-theory perspective on LLN and CLT. Checked 2026-04-24.
MIT RES.6-012 LLN and CLT lecture materials - Second pass - a direct official bridge that contrasts LLN stabilization with CLT fluctuation detail. Checked 2026-04-24.
Penn State STAT 414: The Central Limit Theorem - Paper bridge - a good bridge from formal theorem statements to applied interpretation. Checked 2026-04-24.

Sources checked online on 2026-04-24:

Harvard Stat 110 course overview
Penn State STAT 414 overview
Penn State CLT lesson
MIT RES.6-012 CLT resource page
MIT RES.6-012 LLN and CLT lecture materials