Generalization, Overfitting, and Validation

A bridge page showing why low training loss is not enough, how validation fits into the workflow, and how generalization turns sample performance into a theory question.

Modified

April 26, 2026

Keywords

generalization, overfitting, validation, test error, bias variance

1 Application Snapshot

Optimization can make training loss small.

But ML is not judged by how well a model fits the training sample alone. It is judged by how well it performs on new data.

That gap between fitting the sample and performing well beyond the sample is where validation and generalization enter.

2 Problem Setting

Suppose we train a predictor \(f\) by minimizing empirical risk on a dataset.

We can measure:

training error: performance on the sample used to fit the model
validation error: performance on held-out data used for model or hyperparameter selection
test error: performance on final held-out data used only for evaluation

The central ML question is:

when does low training error say something meaningful about future error?

3 Why This Math Appears

This question reuses the site’s probability and statistics foundations immediately:

training data are a sample, not the whole population
performance estimates vary with the sample
model complexity interacts with estimation uncertainty
hyperparameter tuning can leak information if validation and test roles are confused

It also opens the door to theory:

concentration ideas help bound deviations between empirical and population behavior
learning theory studies when empirical risk is a reliable proxy
optimization can influence generalization through the solutions it prefers

So generalization is not a vague ML slogan. It is the point where optimization, statistics, and theory collide.

4 Math Objects In Use

training loss or empirical risk \(\hat{R}_n(f)\)
population or expected risk \(R(f)\)
validation split
test split
generalization gap

5 A Small Worked Walkthrough

Imagine two models trained on the same regression dataset:

Model A is simple and leaves visible bias in the training fit
Model B is very flexible and drives training loss almost to zero

If Model B also chases noise in the sample, its validation error may rise even while training error keeps falling.

That is the standard overfitting pattern:

training error decreases
validation error first decreases, then worsens

In contrast, underfitting usually shows up when both training and validation error stay high because the model class or features are too limited.

This is why the workflow needs separate roles:

train to fit parameters
validation to tune model choices
test to estimate final performance honestly

6 Implementation or Computation Note

Validation is not only a statistical idea. It shapes how experiments are run.

Typical practical choices include:

train / validation / test split
cross-validation when data are limited
early stopping based on validation behavior
hyperparameter tuning using validation only

One recurring failure mode is leakage: information from the validation or test set indirectly influences training decisions, making performance look better than it really is.

7 Failure Modes

using the test set repeatedly during model development
tuning hyperparameters on the same data used for final evaluation
assuming zero training error means success
reading a noisy validation improvement as a real gain
treating generalization as if it were only an engineering heuristic rather than a statistical question

8 Paper Bridge

CS229 Lecture 19: Advice for Applying Machine Learning - First pass - official practical bridge for validation, diagnostics, and bias-variance thinking. Checked 2026-04-24.
CS229T / Statistical Learning Theory - Paper bridge - a direct entry from validation intuition into formal learning-theory questions. Checked 2026-04-24.

9 Sources and Further Reading

CS229 Lecture Notes 5: Regularization and Model Selection - First pass - official Stanford notes on model selection, regularization, and validation logic. Checked 2026-04-24.
CS229 Lecture 19: Advice for Applying Machine Learning - First pass - official practical bridge for bias, variance, diagnostics, and error analysis. Checked 2026-04-24.
CS229T / Statistical Learning Theory - Second pass - official theory-facing course hub for readers ready to move from intuition to guarantees. Checked 2026-04-24.
CS 189 Syllabus - Second pass - official Berkeley ML course framing the pipeline from data to optimization and evaluation. Checked 2026-04-24.