Positive Semidefinite Matrices and Quadratic Forms

How symmetric matrices become geometric objects through quadratic forms, and why PSD structure sits at the center of optimization, covariance, kernels, and spectral reasoning.
Modified

April 26, 2026

Keywords

positive semidefinite, quadratic form, Gram matrix, covariance, Cholesky

1 Role

This is the second page of the Matrix Analysis module.

The norms page treated matrices as operators that stretch vectors.

This page treats a symmetric matrix as a quadratic object:

\[ x^\top A x. \]

That shift is what makes PSD structure central in optimization, covariance geometry, kernels, and spectral proofs.

2 First-Pass Promise

Read this page after Norms and Operator Norms.

If you stop here, you should still understand:

  • what positive semidefinite means
  • why PSD is a statement about quadratic forms, not just entries
  • why Gram matrices and covariance matrices are automatically PSD
  • why PSD structure appears constantly in optimization and ML theory

3 Why It Matters

Many of the most important matrices in modern theory are not arbitrary.

They are PSD:

  • covariance matrices
  • Gram and kernel matrices
  • Hessians of convex quadratics
  • graph Laplacians
  • normal-equation matrices like \(X^\top X\)

PSD structure matters because it gives:

  • nonnegative energies
  • nonnegative eigenvalues
  • factorization as a square-like object
  • a partial order that supports comparison arguments later

4 Prerequisite Recall

  • operator norms measure worst-case amplification
  • symmetric matrices have real eigenvalues
  • quadratic expressions like \(x^\top A x\) encode geometric information
  • Hessians and quadratic objectives are good examples of where this language appears

5 Intuition

5.1 Quadratic Forms

Given a symmetric matrix \(A\), the expression

\[ x^\top A x \]

measures a direction-dependent energy.

If this quantity is always nonnegative, then the matrix never bends space in a genuinely negative direction.

That is the PSD condition.

5.2 Why Symmetry Matters

PSD language belongs naturally to symmetric matrices.

Without symmetry, the quadratic-form picture and spectral picture no longer line up cleanly.

5.3 Gram And Covariance Pictures

If

\[ A = B^\top B, \]

then

\[ x^\top A x = \|Bx\|_2^2 \ge 0. \]

That is the cleanest PSD intuition:

a PSD matrix is something that looks like a squared linear map

This is why Gram matrices and covariance matrices keep showing up here.

6 Formal Core

Definition 1 (Definition: Positive Semidefinite) A symmetric matrix \(A\) is positive semidefinite if

\[ x^\top A x \ge 0 \qquad \text{for all }x. \]

If the inequality is strict for every nonzero \(x\), then \(A\) is positive definite.

Theorem 1 (Theorem Idea: Equivalent PSD Pictures) For a symmetric matrix \(A\), the following first-pass pictures line up:

  • \(A\) is positive semidefinite
  • every eigenvalue of \(A\) is nonnegative
  • \(A\) can be written as \(B^\top B\) for some matrix \(B\)

These are the main equivalent ways of recognizing PSD structure.

Theorem 2 (Theorem Idea: Gram And Covariance Matrices Are PSD) Matrices of the form

\[ G_{ij}=\langle v_i,v_j\rangle \]

and covariance-type matrices of the form

\[ \Sigma = \mathbb E[(X-\mu)(X-\mu)^\top] \]

are positive semidefinite.

That is why PSD language appears so naturally in machine learning and statistics.

Theorem 3 (Theorem Idea: Positive Definiteness Means Strict Curvature) If a quadratic objective has Hessian matrix \(A\) and \(A\) is positive definite, then the quadratic bends upward in every nonzero direction.

This is one of the simplest bridges from linear algebra to optimization.

7 Worked Example

Consider

\[ A= \begin{bmatrix} 1 & 1\\ 1 & 1 \end{bmatrix}. \]

Then

\[ x^\top A x = \begin{bmatrix} x_1 & x_2 \end{bmatrix} \begin{bmatrix} 1 & 1\\ 1 & 1 \end{bmatrix} \begin{bmatrix} x_1\\ x_2 \end{bmatrix} = (x_1+x_2)^2. \]

So

\[ x^\top A x \ge 0 \qquad \text{for all }x, \]

which means \(A\) is PSD.

But it is not positive definite, because if \(x=(1,-1)\) then

\[ x^\top A x = 0 \]

even though \(x\neq 0\).

This is a good first-pass example because it shows:

  • PSD allows zero-energy directions
  • PD rules them out
  • a PSD matrix can still be singular

It also has the Gram form

\[ A = v v^\top \qquad \text{with } v=(1,1)^\top, \]

so the B^\top B picture is visible immediately.

8 Computation Lens

When checking whether a matrix is PSD in practice, you usually do not test infinitely many vectors.

Instead, you use one of the equivalent pictures:

  1. verify symmetry
  2. inspect eigenvalues
  3. identify a Gram or covariance form
  4. in positive-definite cases, use a Cholesky factorization when appropriate

That is why PSD structure is both theoretically clean and computationally useful.

9 Application Lens

9.1 Optimization

Convex quadratic objectives and many Hessian-based arguments are really PSD statements in disguise.

9.2 Statistics

Covariance matrices are PSD by construction, so variance geometry already lives in this language.

9.3 Machine Learning

Kernel matrices, Gram matrices, and normal-equation matrices like \(X^\top X\) are PSD, which is why PSD structure keeps appearing in regression, kernels, Gaussian processes, and high-dimensional theory.

10 Stop Here For First Pass

If you can now explain:

  • why PSD is a quadratic-form condition
  • why nonnegative eigenvalues are the spectral picture of PSD structure
  • why Gram and covariance matrices are naturally PSD
  • why PSD does not mean invertible or strictly positive in every direction

then this page has done its job.

11 Go Deeper

This is the current stopping point of the live Matrix Analysis opening.

The next natural module steps are:

Until those pages are live, the strongest adjacent pages right now are:

12 Optional Deeper Reading After First Pass

The strongest current references connected to this page are:

13 Sources and Further Reading

Back to top