Linear Probes and Representation Diagnostics

A bridge page showing how linear probes test what a frozen representation makes linearly decodable, and how to read probe results without overclaiming what a model knows.

Modified

April 26, 2026

Keywords

linear probe, representation diagnostics, transfer learning, frozen features, decodability

1 Application Snapshot

A linear probe asks a narrow but useful question:

if we freeze the representation, what information is already easy to decode with a linear head?

That makes linear probing one of the simplest diagnostics for representation quality.

It is especially useful when you want to compare:

different layers of the same model
different pretraining methods
zero-shot behavior versus learned downstream adaptation
frozen-feature transfer versus full fine-tuning

2 Problem Setting

Suppose a pretrained model maps an input \(x\) to a hidden representation

\[ z(x) \in \mathbb{R}^d. \]

We freeze that representation and train only a linear predictor

\[ \hat{y} = W z(x) + b \]

or, for multiclass classification, a softmax head built from those logits.

The encoder is not updated. Only \(W\) and \(b\) are trained on the downstream labels.

So a probe does not ask:

can the whole model solve the task after end-to-end adaptation?

It asks:

does this frozen representation already make the task linearly accessible?

3 Why This Math Appears

This page sits on top of several earlier bridges:

Representation Learning and Geometry of Embeddings: a representation is useful when geometry makes downstream tasks easier
Supervised Learning, Losses, and Empirical Risk: the probe is still a supervised predictor trained on a downstream loss
Experimental Design and Model Evaluation: probe results are only meaningful if the split, metric, and comparison protocol are sound

So linear probing is where representation geometry meets evaluation discipline.

4 Math Objects In Use

frozen representation \(z(x)\)
linear head \(W z + b\)
training and validation splits
accuracy, cross-entropy, or another downstream metric
layer index if we probe multiple hidden states
control baselines that help separate real signal from probe memorization

5 A Small Worked Walkthrough

Suppose a frozen encoder maps four inputs into two-dimensional vectors:

\[ z(x_1) = \begin{bmatrix} 2 \\ 1 \end{bmatrix}, \qquad z(x_2) = \begin{bmatrix} 1.5 \\ 0.5 \end{bmatrix}, \qquad z(x_3) = \begin{bmatrix} -1 \\ -1 \end{bmatrix}, \qquad z(x_4) = \begin{bmatrix} -1.5 \\ -0.5 \end{bmatrix}. \]

Assume \(x_1,x_2\) belong to class A and \(x_3,x_4\) belong to class B.

A linear probe with

\[ w = \begin{bmatrix} 1 \\ 0.5 \end{bmatrix}, \qquad b = 0 \]

computes the score \(s(x)=w^\top z(x)\).

Then

\[ s(x_1)=2.5,\quad s(x_2)=1.75,\quad s(x_3)=-1.5,\quad s(x_4)=-1.75. \]

So a single linear separator already splits the classes.

The important conclusion is not that the model “understands class A.” The narrower conclusion is:

the frozen representation places the two classes in a geometry that a linear rule can separate
a downstream task may therefore require only a small head rather than a full model rewrite

Now imagine probing two layers of the same network:

an early layer gives probe accuracy near random
a later layer gives strong validation accuracy

That suggests the later layer has made task-relevant information more linearly available. It still does not prove that the model internally uses the exact same linear rule during its native prediction pipeline.

6 Implementation or Computation Note

A practical probe workflow usually looks like this:

choose one or more frozen layers
extract representations on a clean train / validation / test split
train only a linear head
compare validation and test behavior across layers or models

Useful diagnostics include:

layerwise probe accuracy
train versus validation gap
zero-shot versus linear-probe versus fine-tuned performance
probe performance under small data budgets
control tasks or random-label baselines

This is why probing is not just “train a tiny classifier.” It is an evaluation design problem.

For modern foundation models, linear probes are often used because they are cheap, reproducible, and less confounded than full fine-tuning. They also let you ask whether a representation is already useful before spending compute on larger adaptation.

7 Failure Modes

treating probe accuracy as proof that the model causally uses that feature
using a probe that is too expressive, so the probe learns the task instead of revealing the representation
comparing probes across models with mismatched data budgets or preprocessing
ignoring train / validation / test leakage
reading tiny accuracy differences as strong structural conclusions
forgetting that high probe accuracy can still coexist with poor robustness or poor calibration

One especially important caution is this:

linear decodability is evidence about accessible information, not a full theory of representation meaning

8 Paper Bridge

Understanding intermediate layers using linear classifier probes - Paper bridge - the classic paper that introduced layerwise linear probes as a way to study what hidden states make easy to classify. Checked 2026-04-24.
Designing and Interpreting Probes with Control Tasks - Paper bridge - the key cautionary paper showing that probe accuracy alone can be misleading without controlling probe capacity and memorization. Checked 2026-04-24.

9 Sources and Further Reading

CS231n Transfer Learning Notes - First pass - official Stanford notes explaining the frozen-feature plus linear-classifier workflow that makes probe intuition concrete. Checked 2026-04-24.
openai/CLIP - First pass - official OpenAI repository for CLIP, a widely cited reference point for zero-shot versus linear-probe transfer evaluation. Checked 2026-04-24.
Stanford CS224N - First pass - official course hub for modern learned representations, transfer, and downstream linear classification in NLP. Checked 2026-04-24.
Understanding intermediate layers using linear classifier probes - Second pass - primary source for the basic layerwise probe idea. Checked 2026-04-24.
Designing and Interpreting Probes with Control Tasks - Second pass - primary source for selectivity, control tasks, and careful interpretation. Checked 2026-04-24.