Concentration and Common Inequalities
concentration, Markov inequality, Chebyshev inequality, union bound, Hoeffding inequality
1 Role
This page is the first real bridge from introductory probability to research-style guarantees.
It explains how probability turns moment information and boundedness assumptions into explicit bounds on rare events.
2 First-Pass Promise
Read this page after Law of Large Numbers and CLT.
If you stop here, you should still understand:
- what concentration is trying to control
- what Markov and Chebyshev inequalities say
- why the union bound shows up everywhere
- why Hoeffding-style bounds are stronger than asymptotic statements for finite samples
3 Why It Matters
The law of large numbers and central limit theorem tell us what happens as sample size grows.
Concentration inequalities answer a different question:
At a fixed sample size, how unlikely is a large deviation?
That question appears constantly in:
- learning theory, where we want empirical averages close to population quantities
- randomized algorithms, where we want random constructions to work reliably
- statistics, where we want finite-sample confidence or error control
- high-dimensional probability, where many events must hold simultaneously
So concentration is where probability starts sounding like theorem statements in modern CS, AI, and engineering papers.
4 Prerequisite Recall
- expectation gives a center
- variance measures average squared fluctuation
- the LLN is asymptotic, while concentration bounds are finite-sample
5 Intuition
Expectation alone tells you where a random quantity is centered, but not how tightly it clusters around that center.
Concentration inequalities progressively add stronger information:
- Markov uses only nonnegativity and expectation
- Chebyshev uses variance
- Hoeffding uses independence and boundedness
So the pattern is:
more structure -> sharper tail control
The union bound then lets us combine many events at once, which is why it appears in proofs about families of hypotheses, coordinates, or bad outcomes.
6 Formal Core
Theorem 1 (Markov Inequality) If \(X \ge 0\) almost surely and \(t>0\), then
\[ P(X \ge t) \le \frac{E[X]}{t}. \]
This is the most basic tail bound. It only needs nonnegativity and a finite expectation.
Theorem 2 (Chebyshev Inequality) If \(X\) has finite variance, then for every \(t>0\),
\[ P(|X-E[X]| \ge t) \le \frac{\operatorname{Var}(X)}{t^2}. \]
This converts second-moment information into a deviation bound around the mean.
Theorem 3 (Union Bound) For events \(A_1,\dots,A_m\),
\[ P\left(\bigcup_{i=1}^m A_i\right) \le \sum_{i=1}^m P(A_i). \]
It is simple, loose, and indispensable.
Theorem 4 (Hoeffding Bound for Bounded Averages) Let \(X_1,\dots,X_n\) be independent random variables with \(0 \le X_i \le 1\), and let
\[ \bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i. \]
Then for every \(\varepsilon > 0\),
\[ P\big(|\bar{X}_n - E[\bar{X}_n]| \ge \varepsilon\big) \le 2\exp(-2n\varepsilon^2). \]
This is a finite-sample, non-asymptotic concentration bound.
7 Worked Example
Let \(X_1,\dots,X_{400}\) be i.i.d. fair-coin indicators, so each \(X_i \in \{0,1\}\) and
\[ E[X_i]=\frac12. \]
Then
\[ \bar{X}_{400} \]
is the observed fraction of heads.
We want to bound
\[ P\left(\left|\bar{X}_{400} - \frac12\right| \ge 0.1\right). \]
Using Hoeffding,
\[ P\left(\left|\bar{X}_{400} - \frac12\right| \ge 0.1\right) \le 2\exp(-2\cdot 400 \cdot 0.1^2) = 2e^{-8} \approx 0.00067. \]
Now compare with Chebyshev.
For a Bernoulli\((1/2)\) variable,
\[ \operatorname{Var}(X_i)=\frac14, \]
so
\[ \operatorname{Var}(\bar{X}_{400}) = \frac{1}{4\cdot 400} = \frac{1}{1600}. \]
Therefore
\[ P\left(\left|\bar{X}_{400} - \frac12\right| \ge 0.1\right) \le \frac{1/1600}{0.1^2} = \frac{1}{16} = 0.0625. \]
Both are valid, but Hoeffding is dramatically sharper here because it uses stronger structural information: independence plus boundedness.
This is the main lesson of concentration work:
- identify the deviation event
- match it with the strongest justified assumptions
- apply the inequality that actually uses that structure
8 Computation Lens
A useful practical routine is:
- define the random quantity whose deviation you care about
- center it at its mean or target value
- list the assumptions you really have:
- nonnegative?
- finite variance?
- bounded?
- independent?
- choose the weakest inequality that still gives a useful bound
This is how concentration enters proofs in learning theory, numerical linear algebra, and randomized algorithms.
9 Application Lens
Concentration powers claims like:
- empirical risk is close to true risk
- random features or sketches preserve structure with high probability
- a randomized algorithm succeeds except on a tiny tail event
- all coordinates in a large family behave well simultaneously after a union bound
So when a paper says with high probability, there is almost always a concentration mechanism hiding underneath.
10 Stop Here For First Pass
If you can now explain:
- why concentration is different from LLN and CLT
- what assumptions Markov, Chebyshev, and Hoeffding need
- why stronger assumptions can give much sharper deviation bounds
- why the union bound is so common in research proofs
then this page has done its main job.
11 Go Deeper
This is a natural finish line for the first pass through foundation probability.
The next places these ideas become central are statistics, learning theory, randomized algorithms, and later high-dimensional probability.
12 Optional Paper Bridge
- MIT RES.6-012 lecture notes -
Second pass- official MIT notes with a direct lecture on inequalities, convergence, and the weak law. Checked2026-04-24. - MIT 15.097 Prediction, Machine Learning, and Statistics lecture on statistical learning theory -
Paper bridge- an official MIT bridge showing how Hoeffding-style concentration enters learning-theory guarantees. Checked2026-04-24.
13 Optional After First Pass
If you want more practice before moving on:
- compare a Chebyshev bound and a Hoeffding bound for the same sample-average problem
- use the union bound to control the chance that any one of ten bad events occurs
- look for the phrase
with high probabilityin a theorem and ask which inequality is doing the work
14 Common Mistakes
- using Markov on a random variable that may be negative
- reading Chebyshev as a sharp estimate rather than a general-purpose bound
- forgetting that Hoeffding needs bounded independent variables
- confusing asymptotic convergence with a concrete finite-sample tail guarantee
- using the union bound backward as if it were an equality
15 Exercises
- Let \(X \ge 0\) with \(E[X]=5\). What does Markov inequality say about \(P(X \ge 20)\)?
- If \(\operatorname{Var}(X)=9\), what does Chebyshev give for \(P(|X-E[X]| \ge 6)\)?
- Let \(X_1,\dots,X_n\) be independent and take values in \([0,1]\). Write the Hoeffding bound for \(P(|\bar{X}_n - E[\bar{X}_n]| \ge \varepsilon)\).
16 Sources and Further Reading
- Harvard Stat 110 -
First pass- strong official course hub that connects inequalities to broad probability intuition. Checked2026-04-24. - MIT RES.6-012 lecture notes -
First pass- official MIT notes with direct coverage of inequalities, convergence, and weak laws. Checked2026-04-24. - Penn State STAT 414 -
Second pass- official probability notes that help connect concentration ideas back to the rest of the introductory sequence. Checked2026-04-24. - MIT 15.097 statistical learning theory lecture -
Paper bridge- official MIT bridge from concentration to learning-theory statements. Checked2026-04-24.
Sources checked online on 2026-04-24:
- Harvard Stat 110 course homepage
- MIT RES.6-012 lecture notes page
- Penn State STAT 414 overview
- MIT 15.097 lecture on statistical learning theory