Duality and Certificates

How the dual problem produces lower bounds, sensitivity information, and sometimes exact certificates of optimality.
Modified

April 26, 2026

Keywords

duality, dual problem, weak duality, strong duality, certificate

1 Role

This page closes the first-pass optimization spine.

It explains how optimization problems produce a second problem, the dual, whose feasible points certify lower bounds and whose optimal solutions often reveal sensitivity, structure, and exact optimality.

2 First-Pass Promise

Read this page after Constrained Optimization, KKT, and Lagrangians.

If you stop here, you should still understand:

  • what the dual function is
  • why weak duality gives a lower-bound certificate
  • when strong duality lets primal and dual solutions meet
  • why dual variables carry sensitivity information

3 Why It Matters

Duality is one of the reasons optimization feels deeper than just minimize a formula.

Instead of looking only at candidate primal solutions, duality lets you reason from the other side:

  • produce lower bounds on the best achievable value
  • certify that a proposed solution is already optimal
  • understand which constraints are expensive or binding
  • reformulate a problem into a form that is easier to solve or interpret

This is why duality appears everywhere:

  • in support vector machines and kernel methods
  • in regularized estimation and sparse recovery
  • in control and model predictive control
  • in decomposition methods and large-scale optimization
  • in differentiable optimization layers, where KKT and dual systems become part of a learning pipeline

4 Prerequisite Recall

  • the Lagrangian is

\[ \mathcal{L}(x,\lambda,\nu) = f(x)+\sum_{i=1}^m \lambda_i g_i(x)+\nu^\top(Ax-b) \]

for inequality multipliers \(\lambda \ge 0\) and equality multipliers \(\nu\) - KKT conditions combine primal feasibility, dual feasibility, stationarity, and complementary slackness - in a convex problem with suitable regularity, KKT conditions become sufficient

5 Intuition

The primal problem asks:

what is the best feasible decision x?

The dual problem asks a different question:

what lower bound on the optimal value can I certify by weighting the constraints?

The Lagrangian temporarily relaxes the constraints and adds penalties for violating them. If the multipliers have the right sign, minimizing the Lagrangian over \(x\) produces a quantity that can never exceed the true primal optimum. That quantity is the dual function.

So the dual problem is a search over good lower bounds.

When the best lower bound equals the primal optimum, there is no gap, and the primal-dual pair certifies optimality exactly.

This is the main picture:

  • primal feasible point -> upper bound on the optimum
  • dual feasible point -> lower bound on the optimum
  • matching bounds -> certificate of optimality

6 Formal Core

Consider the constrained problem

\[ \begin{aligned} \text{minimize} \quad & f(x) \\ \text{subject to} \quad & g_i(x) \le 0,\quad i=1,\dots,m, \\ & Ax=b. \end{aligned} \]

Definition 1 (Definition) The dual function is

\[ q(\lambda,\nu)=\inf_x \mathcal{L}(x,\lambda,\nu), \]

where

\[ \mathcal{L}(x,\lambda,\nu) = f(x)+\sum_{i=1}^m \lambda_i g_i(x)+\nu^\top(Ax-b) \]

and \(\lambda \ge 0\).

The dual problem is

\[ \begin{aligned} \text{maximize} \quad & q(\lambda,\nu) \\ \text{subject to} \quad & \lambda \ge 0. \end{aligned} \]

Theorem 1 (Weak Duality) For every primal feasible point \(x\) and every dual feasible pair \((\lambda,\nu)\),

\[ q(\lambda,\nu)\le f(x). \]

In particular, if \(p^\star\) is the primal optimum and \(d^\star\) is the dual optimum, then

\[ d^\star \le p^\star. \]

So every dual feasible point is already a valid lower-bound certificate.

Theorem 2 (Strong Duality (Convex Case)) If the primal problem is convex and a regularity condition such as Slater’s condition holds, then

\[ d^\star = p^\star. \]

In that case, an optimal primal-dual pair certifies optimality exactly.

Under the same convex regularity assumptions, KKT conditions are the compact certificate tying strong duality to first-order geometry.

7 Worked Example

Consider the one-dimensional convex problem

\[ \begin{aligned} \text{minimize} \quad & x^2 \\ \text{subject to} \quad & 1-x \le 0. \end{aligned} \]

The feasible set is \(x \ge 1\), so the primal optimum is clearly

\[ x^\star = 1, \qquad p^\star = 1. \]

Now form the Lagrangian:

\[ \mathcal{L}(x,\lambda)=x^2+\lambda(1-x), \qquad \lambda \ge 0. \]

The dual function is

\[ q(\lambda)=\inf_x \left(x^2+\lambda(1-x)\right). \]

To compute the infimum, minimize over \(x\):

\[ \frac{d}{dx}\big(x^2+\lambda(1-x)\big)=2x-\lambda. \]

So the minimizing value is

\[ x=\frac{\lambda}{2}. \]

Plugging back in,

\[ q(\lambda) = \left(\frac{\lambda}{2}\right)^2+\lambda\left(1-\frac{\lambda}{2}\right) = \lambda-\frac{\lambda^2}{4}. \]

So the dual problem is

\[ \max_{\lambda\ge 0}\left(\lambda-\frac{\lambda^2}{4}\right). \]

This is a concave quadratic in \(\lambda\), maximized at

\[ \lambda^\star = 2, \]

and the optimal dual value is

\[ d^\star = q(2)=2-1=1. \]

So

\[ d^\star = p^\star = 1. \]

This one small example shows the full pattern:

  1. primal solution gives an achievable value
  2. dual solution gives a lower bound
  3. the values match
  4. optimality is certified

It also gives a sensitivity interpretation: the multiplier \(\lambda^\star=2\) measures how costly it is, locally, to tighten the constraint threshold.

8 Computation Lens

Duality is not just theoretical decoration. It often changes how problems are solved.

Sometimes the dual problem is:

  • lower dimensional
  • smoother
  • easier to decompose
  • easier to distribute across data or constraints
  • more interpretable because the variables are attached to constraints, margins, or prices

This is why primal-dual methods, decomposition methods, mirror-style methods, and constraint-aware ML formulations keep returning to the dual viewpoint.

In computation, the most important habit is:

  • primal feasible point gives an upper bound
  • dual feasible point gives a lower bound
  • the gap tells you how far you are from a certificate

9 Application Lens

Support vector machines are one of the cleanest examples of duality in machine learning. The margin problem can be solved or interpreted through its dual, and the resulting dual variables identify support vectors and kernelized structure.

More generally, duality powers:

  • regularized estimation and sparsity methods
  • Lagrangian relaxations in combinatorial and structured problems
  • control and moving-horizon estimation formulations
  • differentiable optimization layers, where dual and KKT structure are used inside end-to-end systems

So duality is not only about proving theorems. It is also about extracting useful structure from an optimization model.

10 Stop Here For First Pass

If you can now explain:

  • what the dual function is
  • why weak duality automatically gives lower bounds
  • when strong duality gives exact certificates
  • why dual variables can be read as sensitivity information

then this page has done its main job.

11 Go Deeper

The strongest next pages are:

  1. Optimization for Machine Learning for the ML-training perspective
  2. Bayesian Optimization and Surrogate Modeling for another view of certificates, bounds, and acquisition tradeoffs
  3. Optimization module overview for the now-complete first-pass spine

12 Optional Paper Bridge

13 Optional After First Pass

If you want more depth after the first pass:

  • read the duality lectures in Stanford SEE EE364A and follow how weak duality becomes strong duality under Slater-type conditions
  • revisit the KKT page and ask which lines are really duality statements in compressed form
  • compare primal and dual formulations in an SVM or lasso-style problem

14 Common Mistakes

  • thinking duality only matters when the primal problem is hard
  • forgetting that weak duality needs dual feasibility
  • assuming strong duality always holds without a regularity condition
  • treating multipliers only as algebraic variables instead of lower-bound and sensitivity objects
  • confusing a small duality gap with exact optimality before feasibility is checked

15 Exercises

  1. For the worked example, verify directly that \(q(\lambda)\le 1\) for every \(\lambda\ge 0\).
  2. Explain in words why a dual feasible point gives a lower bound rather than an upper bound.
  3. Suppose a primal feasible point and a dual feasible point have the same objective value. What can you conclude?

16 Sources and Further Reading

Sources checked online on 2026-04-24:

  • Stanford EE364a course page
  • Stanford SEE EE364A page
  • Boyd and Vandenberghe convex optimization book page
  • MIT 6.253 course page
  • NeurIPS 2019 differentiable convex optimization layers page
  • PMLR page for learning-based moving horizon estimation
Back to top