Concentration and Common Inequalities

How probability turns averages and random quantities into finite-sample guarantees using Markov, Chebyshev, union bounds, and bounded-difference concentration.
Modified

April 26, 2026

Keywords

concentration, Markov inequality, Chebyshev inequality, union bound, Hoeffding inequality

1 Role

This page is the first real bridge from introductory probability to research-style guarantees.

It explains how probability turns moment information and boundedness assumptions into explicit bounds on rare events.

2 First-Pass Promise

Read this page after Law of Large Numbers and CLT.

If you stop here, you should still understand:

  • what concentration is trying to control
  • what Markov and Chebyshev inequalities say
  • why the union bound shows up everywhere
  • why Hoeffding-style bounds are stronger than asymptotic statements for finite samples

3 Why It Matters

The law of large numbers and central limit theorem tell us what happens as sample size grows.

Concentration inequalities answer a different question:

At a fixed sample size, how unlikely is a large deviation?

That question appears constantly in:

  • learning theory, where we want empirical averages close to population quantities
  • randomized algorithms, where we want random constructions to work reliably
  • statistics, where we want finite-sample confidence or error control
  • high-dimensional probability, where many events must hold simultaneously

So concentration is where probability starts sounding like theorem statements in modern CS, AI, and engineering papers.

4 Prerequisite Recall

  • expectation gives a center
  • variance measures average squared fluctuation
  • the LLN is asymptotic, while concentration bounds are finite-sample

5 Intuition

Expectation alone tells you where a random quantity is centered, but not how tightly it clusters around that center.

Concentration inequalities progressively add stronger information:

  • Markov uses only nonnegativity and expectation
  • Chebyshev uses variance
  • Hoeffding uses independence and boundedness

So the pattern is:

more structure -> sharper tail control

The union bound then lets us combine many events at once, which is why it appears in proofs about families of hypotheses, coordinates, or bad outcomes.

6 Formal Core

Theorem 1 (Markov Inequality) If \(X \ge 0\) almost surely and \(t>0\), then

\[ P(X \ge t) \le \frac{E[X]}{t}. \]

This is the most basic tail bound. It only needs nonnegativity and a finite expectation.

Theorem 2 (Chebyshev Inequality) If \(X\) has finite variance, then for every \(t>0\),

\[ P(|X-E[X]| \ge t) \le \frac{\operatorname{Var}(X)}{t^2}. \]

This converts second-moment information into a deviation bound around the mean.

Theorem 3 (Union Bound) For events \(A_1,\dots,A_m\),

\[ P\left(\bigcup_{i=1}^m A_i\right) \le \sum_{i=1}^m P(A_i). \]

It is simple, loose, and indispensable.

Theorem 4 (Hoeffding Bound for Bounded Averages) Let \(X_1,\dots,X_n\) be independent random variables with \(0 \le X_i \le 1\), and let

\[ \bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i. \]

Then for every \(\varepsilon > 0\),

\[ P\big(|\bar{X}_n - E[\bar{X}_n]| \ge \varepsilon\big) \le 2\exp(-2n\varepsilon^2). \]

This is a finite-sample, non-asymptotic concentration bound.

7 Worked Example

Let \(X_1,\dots,X_{400}\) be i.i.d. fair-coin indicators, so each \(X_i \in \{0,1\}\) and

\[ E[X_i]=\frac12. \]

Then

\[ \bar{X}_{400} \]

is the observed fraction of heads.

We want to bound

\[ P\left(\left|\bar{X}_{400} - \frac12\right| \ge 0.1\right). \]

Using Hoeffding,

\[ P\left(\left|\bar{X}_{400} - \frac12\right| \ge 0.1\right) \le 2\exp(-2\cdot 400 \cdot 0.1^2) = 2e^{-8} \approx 0.00067. \]

Now compare with Chebyshev.

For a Bernoulli\((1/2)\) variable,

\[ \operatorname{Var}(X_i)=\frac14, \]

so

\[ \operatorname{Var}(\bar{X}_{400}) = \frac{1}{4\cdot 400} = \frac{1}{1600}. \]

Therefore

\[ P\left(\left|\bar{X}_{400} - \frac12\right| \ge 0.1\right) \le \frac{1/1600}{0.1^2} = \frac{1}{16} = 0.0625. \]

Both are valid, but Hoeffding is dramatically sharper here because it uses stronger structural information: independence plus boundedness.

This is the main lesson of concentration work:

  1. identify the deviation event
  2. match it with the strongest justified assumptions
  3. apply the inequality that actually uses that structure

8 Computation Lens

A useful practical routine is:

  1. define the random quantity whose deviation you care about
  2. center it at its mean or target value
  3. list the assumptions you really have:
    • nonnegative?
    • finite variance?
    • bounded?
    • independent?
  4. choose the weakest inequality that still gives a useful bound

This is how concentration enters proofs in learning theory, numerical linear algebra, and randomized algorithms.

9 Application Lens

Concentration powers claims like:

  • empirical risk is close to true risk
  • random features or sketches preserve structure with high probability
  • a randomized algorithm succeeds except on a tiny tail event
  • all coordinates in a large family behave well simultaneously after a union bound

So when a paper says with high probability, there is almost always a concentration mechanism hiding underneath.

10 Stop Here For First Pass

If you can now explain:

  • why concentration is different from LLN and CLT
  • what assumptions Markov, Chebyshev, and Hoeffding need
  • why stronger assumptions can give much sharper deviation bounds
  • why the union bound is so common in research proofs

then this page has done its main job.

11 Go Deeper

This is a natural finish line for the first pass through foundation probability.

The next places these ideas become central are statistics, learning theory, randomized algorithms, and later high-dimensional probability.

12 Optional Paper Bridge

13 Optional After First Pass

If you want more practice before moving on:

  • compare a Chebyshev bound and a Hoeffding bound for the same sample-average problem
  • use the union bound to control the chance that any one of ten bad events occurs
  • look for the phrase with high probability in a theorem and ask which inequality is doing the work

14 Common Mistakes

  • using Markov on a random variable that may be negative
  • reading Chebyshev as a sharp estimate rather than a general-purpose bound
  • forgetting that Hoeffding needs bounded independent variables
  • confusing asymptotic convergence with a concrete finite-sample tail guarantee
  • using the union bound backward as if it were an equality

15 Exercises

  1. Let \(X \ge 0\) with \(E[X]=5\). What does Markov inequality say about \(P(X \ge 20)\)?
  2. If \(\operatorname{Var}(X)=9\), what does Chebyshev give for \(P(|X-E[X]| \ge 6)\)?
  3. Let \(X_1,\dots,X_n\) be independent and take values in \([0,1]\). Write the Hoeffding bound for \(P(|\bar{X}_n - E[\bar{X}_n]| \ge \varepsilon)\).

16 Sources and Further Reading

  • Harvard Stat 110 - First pass - strong official course hub that connects inequalities to broad probability intuition. Checked 2026-04-24.
  • MIT RES.6-012 lecture notes - First pass - official MIT notes with direct coverage of inequalities, convergence, and weak laws. Checked 2026-04-24.
  • Penn State STAT 414 - Second pass - official probability notes that help connect concentration ideas back to the rest of the introductory sequence. Checked 2026-04-24.
  • MIT 15.097 statistical learning theory lecture - Paper bridge - official MIT bridge from concentration to learning-theory statements. Checked 2026-04-24.

Sources checked online on 2026-04-24:

  • Harvard Stat 110 course homepage
  • MIT RES.6-012 lecture notes page
  • Penn State STAT 414 overview
  • MIT 15.097 lecture on statistical learning theory
Back to top