Generalization, Overfitting, and Validation

A bridge page showing why low training loss is not enough, how validation fits into the workflow, and how generalization turns sample performance into a theory question.
Modified

April 26, 2026

Keywords

generalization, overfitting, validation, test error, bias variance

1 Application Snapshot

Optimization can make training loss small.

But ML is not judged by how well a model fits the training sample alone. It is judged by how well it performs on new data.

That gap between fitting the sample and performing well beyond the sample is where validation and generalization enter.

2 Problem Setting

Suppose we train a predictor \(f\) by minimizing empirical risk on a dataset.

We can measure:

  • training error: performance on the sample used to fit the model
  • validation error: performance on held-out data used for model or hyperparameter selection
  • test error: performance on final held-out data used only for evaluation

The central ML question is:

when does low training error say something meaningful about future error?

3 Why This Math Appears

This question reuses the site’s probability and statistics foundations immediately:

  • training data are a sample, not the whole population
  • performance estimates vary with the sample
  • model complexity interacts with estimation uncertainty
  • hyperparameter tuning can leak information if validation and test roles are confused

It also opens the door to theory:

  • concentration ideas help bound deviations between empirical and population behavior
  • learning theory studies when empirical risk is a reliable proxy
  • optimization can influence generalization through the solutions it prefers

So generalization is not a vague ML slogan. It is the point where optimization, statistics, and theory collide.

4 Math Objects In Use

  • training loss or empirical risk \(\hat{R}_n(f)\)
  • population or expected risk \(R(f)\)
  • validation split
  • test split
  • generalization gap

5 A Small Worked Walkthrough

Imagine two models trained on the same regression dataset:

  • Model A is simple and leaves visible bias in the training fit
  • Model B is very flexible and drives training loss almost to zero

If Model B also chases noise in the sample, its validation error may rise even while training error keeps falling.

That is the standard overfitting pattern:

  • training error decreases
  • validation error first decreases, then worsens

In contrast, underfitting usually shows up when both training and validation error stay high because the model class or features are too limited.

This is why the workflow needs separate roles:

  • train to fit parameters
  • validation to tune model choices
  • test to estimate final performance honestly

6 Implementation or Computation Note

Validation is not only a statistical idea. It shapes how experiments are run.

Typical practical choices include:

  • train / validation / test split
  • cross-validation when data are limited
  • early stopping based on validation behavior
  • hyperparameter tuning using validation only

One recurring failure mode is leakage: information from the validation or test set indirectly influences training decisions, making performance look better than it really is.

7 Failure Modes

  • using the test set repeatedly during model development
  • tuning hyperparameters on the same data used for final evaluation
  • assuming zero training error means success
  • reading a noisy validation improvement as a real gain
  • treating generalization as if it were only an engineering heuristic rather than a statistical question

8 Paper Bridge

9 Sources and Further Reading

Back to top