Sub-Gaussian and Sub-Exponential Variables

How tail classes give reusable non-asymptotic control for sums, maxima, norms, and random matrix arguments.
Modified

April 26, 2026

Keywords

sub-gaussian, sub-exponential, tails, concentration, Orlicz norm

1 Role

This is the second page of the High-Dimensional Probability module.

The first page introduced the non-asymptotic concentration mindset.

This page introduces two tail classes that make that mindset reusable:

  • sub-Gaussian
  • sub-exponential

Instead of remembering many unrelated inequalities one by one, you now get a language for saying what kind of tails a random quantity has and what sorts of concentration behavior should follow.

2 First-Pass Promise

Read this page after Concentration Beyond Basics.

If you stop here, you should still understand:

  • what sub-Gaussian and sub-exponential variables mean at a first pass
  • how their tails differ
  • why these classes appear constantly in modern concentration arguments
  • why they are the bridge from scalar concentration to vectors, matrices, and empirical processes

3 Why It Matters

Modern high-dimensional arguments need a reusable way to say:

  • this random quantity has Gaussian-like tails
  • this one has slightly heavier, but still well-controlled, tails
  • sums or maxima of these quantities can still be handled uniformly

That is exactly what tail classes do.

They package concentration behavior into stable objects that can be moved from one proof to another.

This is why modern papers often say things like:

  • “assume the coordinates are sub-Gaussian”
  • “the noise is sub-exponential”
  • “the rows are isotropic and sub-Gaussian”

Those phrases are not decoration. They summarize the deviation behavior needed for the proof.

4 Prerequisite Recall

  • concentration bounds are used in non-asymptotic form, with explicit dependence on n, d, and \delta
  • maxima and simultaneous control often create logarithmic dimension terms
  • the next step is to classify random variables by tail behavior instead of treating each inequality in isolation

5 Intuition

5.1 Sub-Gaussian

A sub-Gaussian random variable has tails that decay like a Gaussian, up to constants.

At a first pass, think:

Gaussian-like tail decay, stable concentration, good behavior under averaging

Examples include:

  • centered bounded random variables
  • Gaussian variables themselves
  • many linear functionals of well-behaved random vectors

5.2 Sub-Exponential

A sub-exponential random variable has heavier tails than a sub-Gaussian one, but still decays fast enough to support useful non-asymptotic bounds.

At a first pass, think:

heavier than Gaussian, but still exponentially controlled

This class often appears when you square a sub-Gaussian quantity, or when a proof naturally produces products or quadratic forms.

5.3 Why The Distinction Matters

Sub-Gaussian variables often lead to concentration with deviation scale like

\[ \sqrt{\frac{\log(1/\delta)}{n}}, \]

while sub-exponential variables often introduce a two-regime behavior:

  • sub-Gaussian-like for smaller deviations
  • linear/exponential-type behavior for larger deviations

That difference matters once proofs start mixing sums, maxima, and matrix-valued quantities.

6 Formal Core

Definition 1 (Definition: Sub-Gaussian Tail Behavior) A centered random variable \(X\) is sub-Gaussian if its tails satisfy a bound of the form

\[ \mathbb P(|X|\ge t)\le 2e^{-ct^2/K^2} \]

for all \(t\ge 0\), with constants \(c>0\) and scale parameter \(K\).

At first pass, the exact equivalent definitions are less important than the message:

  • tails decay like \(e^{-t^2}\)
  • concentration is strong
  • many bounded or Gaussian-like variables fall into this class

Definition 2 (Definition: Sub-Exponential Tail Behavior) A centered random variable \(X\) is sub-exponential if its tails satisfy a bound of the form

\[ \mathbb P(|X|\ge t)\le 2\exp\!\left( -c\min\left\{\frac{t^2}{K^2},\frac{t}{K}\right\} \right) \]

for all \(t\ge 0\), with constants \(c>0\) and scale parameter \(K\).

At first pass, the key contrast is:

  • sub-Gaussian tails behave like \(e^{-t^2}\)
  • sub-exponential tails behave more like \(e^{-t}\)

Theorem 1 (Theorem Idea: Averages Of Sub-Gaussian Variables Concentrate Well) If \(X_1,\dots,X_n\) are independent centered sub-Gaussian variables with common scale parameter \(K\), then their empirical average satisfies a sub-Gaussian-type deviation bound.

So with probability at least \(1-\delta\),

\[ \left| \frac{1}{n}\sum_{i=1}^n X_i \right| \lesssim K\sqrt{\frac{\log(1/\delta)}{n}}. \]

Theorem 2 (Theorem Idea: Squaring Produces Sub-Exponential Behavior) If \(X\) is sub-Gaussian, then \(X^2\) is typically sub-exponential after centering.

That is one reason sub-exponential variables appear so often in covariance, norm, and quadratic-form arguments.

7 Worked Example

Suppose \(X_1,\dots,X_n\) are independent centered random variables with values in \([-1,1]\).

From a first-pass viewpoint, boundedness already implies sub-Gaussian behavior.

That means the empirical average

\[ \frac{1}{n}\sum_{i=1}^n X_i \]

inherits strong concentration, so at confidence level \(1-\delta\) its deviation is on the order of

\[ \sqrt{\frac{\log(1/\delta)}{n}}. \]

Now look instead at the squared variables \(X_i^2\).

These no longer behave like centered Gaussian tails in the same way. They fit more naturally into a sub-exponential picture.

This is the first useful transition:

  • raw variables may be sub-Gaussian
  • quadratic quantities built from them are often sub-exponential

That is exactly the pattern that later reappears in covariance estimation, random matrix bounds, and norm concentration.

8 Computation Lens

When reading a high-dimensional proof, a very common workflow is:

  1. identify the random quantity
  2. ask whether it is sub-Gaussian or sub-exponential
  3. choose the right concentration tool for sums, maxima, or matrix lifts

That is why these tail classes save effort. They compress many proof choices into one line of structural information.

9 Application Lens

9.1 Learning Theory

Generalization arguments often assume feature vectors, losses, or gradients have sub-Gaussian-type tails, because this gives finite-sample concentration with explicit confidence dependence.

9.2 High-Dimensional Statistics

Covariance estimation, regression with random design, and effective-rank arguments often move from sub-Gaussian vectors to sub-exponential quadratic forms.

9.3 Random Matrices

Many random-matrix results start with rows or entries that are sub-Gaussian. The spectral conclusions then depend on lifting scalar tail control to matrix norms.

10 Stop Here For First Pass

If you can now explain:

  • how sub-Gaussian tails differ from sub-exponential tails
  • why bounded variables are often treated as sub-Gaussian
  • why squared or quadratic quantities often become sub-exponential
  • why these tail classes are more reusable than memorizing isolated inequalities

then this page has done its job.

11 Go Deeper

After this page, the next natural step is:

The current best adjacent live pages are:

12 Optional Deeper Reading After First Pass

The strongest current references connected to this page are:

13 Sources and Further Reading

Back to top