VC Dimension and Shattering

How combinatorial richness of a hypothesis class controls what label patterns it can realize and why that matters for sample complexity.

Modified

April 26, 2026

Keywords

VC dimension, shattering, growth function, Sauer’s lemma, capacity

1 Role

This is the third page of the Learning Theory module.

The PAC page explained that sample complexity depends on the richness of the class.

This page gives the first major answer to:

how do we measure the richness of an infinite hypothesis class?

The answer, for binary classification, is the language of shattering and VC dimension.

2 First-Pass Promise

Read this page after PAC Learning, Sample Complexity, and the Learning Setup.

If you stop here, you should still understand:

what it means for a class to shatter a finite set
what VC dimension measures
why VC dimension matters more than raw cardinality when the class is infinite
why combinatorial richness controls generalization and sample complexity

3 Why It Matters

Finite-class PAC bounds depend on \(\log |\mathcal H|\).

That is fine if \(\mathcal H\) is literally finite, but many useful classes are infinite:

thresholds with a real-valued parameter
linear separators with real-valued weights
intervals, halfspaces, and smooth function families

So class size alone is not the right complexity measure.

VC dimension gives a more structural notion of richness:

how many points can the class label in every possible way?

That question turns out to control sample complexity much better than naive counting.

4 Prerequisite Recall

PAC/sample-complexity language asks how many samples are enough for reliable learning
realizable and agnostic settings are different, but both need a notion of class richness
binary classification makes it natural to think about label patterns on finite point sets

5 Intuition

5.1 Label Patterns

Take a finite set of inputs \(x_1,\dots,x_n\).

Each binary classifier \(h\) induces a label vector

\[ (h(x_1),\dots,h(x_n)) \in \{0,1\}^n. \]

If a class can realize many such labelings, then it is expressive.

If it can realize all of them, then on that set the class is maximally flexible.

5.2 Shattering

To shatter a finite set means the class can realize every possible binary labeling on that set.

That is the first strong sign that the class is rich enough to fit many kinds of data.

5.3 VC Dimension

The VC dimension asks for the largest number of points that can be shattered.

So it is a single number summarizing combinatorial richness.

It is not perfect, but it is one of the first major capacity measures in learning theory.

6 Formal Core

To keep the first pass concrete, this page stays in binary classification.

Definition 1 (Definition: Shattering) Let \(\mathcal H\) be a class of binary-valued functions on \(\mathcal X\).

A finite set \(\{x_1,\dots,x_n\}\subset \mathcal X\) is shattered by \(\mathcal H\) if for every binary vector \((y_1,\dots,y_n)\in\{0,1\}^n\), there exists some \(h\in\mathcal H\) such that

\[ h(x_i)=y_i \qquad \text{for all } i=1,\dots,n. \]

Definition 2 (Definition: VC Dimension) The VC dimension of \(\mathcal H\), written \(\mathrm{VC}(\mathcal H)\), is the largest integer \(d\) such that some set of size \(d\) is shattered by \(\mathcal H\).

If sets of arbitrarily large finite size can be shattered, then the VC dimension is infinite.

Important warning:

VC dimension \(d\) means there exists at least one shattered set of size \(d\). It does not mean that every set of size \(d\) is shattered.

Definition 3 (Definition Idea: Growth Function) For \(n\) points, the growth function counts the maximum number of distinct binary labelings that \(\mathcal H\) can realize on a set of size \(n\).

The growth function is the more detailed object; VC dimension is a summary number extracted from it.

Theorem 1 (Theorem Idea: Sauer’s Lemma) If \(\mathrm{VC}(\mathcal H)=d\), then once \(n>d\), the number of labelings \(\mathcal H\) can realize on \(n\) points grows only polynomially in \(n\), not exponentially like \(2^n\).

This is the key bridge from combinatorics to generalization: finite VC dimension prevents unrestricted label explosion.

7 Worked Example

Consider threshold classifiers on the real line:

\[ \mathcal H = \{h_t(x)=\mathbf{1}\{x\ge t\}: t\in\mathbb R\}. \]

7.1 One Point

Take any single point \(x_1\).

We can realize both labels:

choose \(t>x_1\) to get \(h_t(x_1)=0\)
choose \(t\le x_1\) to get \(h_t(x_1)=1\)

So every one-point set is shattered.

7.2 Two Points

Now take two points with \(x_1<x_2\).

The possible threshold labelings are:

\((0,0)\)
\((0,1)\)
\((1,1)\)

But we cannot realize

\[ (1,0), \]

because a threshold that labels the left point 1 will also label the right point 1.

So no two-point set is shattered.

Therefore

\[ \mathrm{VC}(\mathcal H)=1. \]

This example is small, but it already shows why VC dimension is meaningful:

the class is infinite, yet its effective binary-classification richness is only 1.

8 Computation Lens

VC dimension is not a training algorithm.

It is a way to talk about what a class is capable of representing before you even run optimization.

That separation matters:

optimization tells you how to fit inside the class
VC dimension tells you how rich the class already is

This is why two model families with similar training behavior can still have very different theoretical capacity.

9 Application Lens

9.1 Infinite Classes Need Structural Capacity

Thresholds, intervals, linear separators, and neural models are usually infinite classes. VC dimension is one of the first ways to say something finite and useful about their complexity.

9.2 Bridge To Generalization Bounds

The next page in the module will use VC-style capacity control to explain why uniform convergence is even plausible. VC dimension is one of the cleanest first answers to:

why can one sample say something about all hypotheses at once?

9.3 Reading Theory Papers

When a paper says a class has small VC dimension, it is not just dropping a definition. It is claiming a structural reason why sample complexity or uniform convergence should be controlled.

10 Stop Here For First Pass

If you can now explain:

what shattering means
what VC dimension measures
why infinite classes can still have finite VC dimension
why VC dimension is a better first-pass capacity notion than raw class size
why Sauer-style control matters for later generalization bounds

then this page has done its first-pass job.

11 Go Deeper

The next natural module step is:

Uniform Convergence and Generalization Bounds

For now, the best live next steps are:

12 Sources and Further Reading

Stanford CS229T Notes - First pass - official notes with PAC framing, shattering coefficient, VC dimension, and Sauer-style control. Checked 2026-04-25.
Stanford STATS214 / CS229M: Machine Learning Theory - First pass - current official theory course page showing how VC-style tools fit into a modern learning-theory arc. Checked 2026-04-25.
Berkeley STAT154: VC dimension and uniform laws for zero-one loss - First pass - official lecture notes with a very clear worked explanation of shattering and VC dimension. Checked 2026-04-25.
Berkeley CS281B / STAT241B: Statistical Learning Theory - Second pass - strong official course page for deeper structural and sample-complexity arguments. Checked 2026-04-25.