Optimization
optimization, convex optimization, KKT, duality, gradient methods
1 Why This Module Matters
Optimization is the language of fit the model, choose the design, allocate the resource, and control the system.
Once a problem becomes pick variables to minimize cost while respecting structure, you are in optimization, even if the surrounding application is machine learning, signal processing, robotics, operations research, or scientific computing.
This module is where several earlier strands finally meet:
- linear algebra gives the geometry of spaces, projections, and operators
- probability and statistics explain what objective functions are trying to estimate or regularize
- calculus explains local change, gradients, and curvature
- algorithms turn optimality conditions into actual solvers
If this module is skipped, later material starts to feel like disconnected recipes: gradient descent becomes a trick, Lagrange multipliers become a symbol game, duality becomes mysterious, and ML training objectives look more arbitrary than they really are.
2 First Pass Through This Module
- Convex Sets and Separation
- Convex Functions and Subgradients
- Constrained Optimization, KKT, and Lagrangians
- Unconstrained First-Order Methods
- Duality and Certificates
This module is best read after you are already comfortable with derivatives, partial derivatives, gradients, and basic curvature intuition from the site’s calculus modules.
4 Core Concepts
- Convex Sets and Separation: the geometric foundation for feasible regions, hyperplane certificates, and later duality arguments.
- Convex Functions and Subgradients: the analytic layer that explains shape, global lower bounds, and nonsmooth first-order reasoning.
- Constrained Optimization, KKT, and Lagrangians: the first bridge from informal
optimize under rulesthinking to precise optimality conditions and dual interpretations. - Unconstrained First-Order Methods: the algorithmic layer behind gradient descent, step sizes, convergence, and large-scale optimization.
- Duality and Certificates: the certification layer that explains lower bounds, strong duality, and why multipliers are more than algebraic artifacts.
5 Proof Patterns In This Module
First-order stationarity: at an optimum, feasible perturbations cannot improve the objective to first order.Geometry of active constraints: optimality becomes a balance between the objective gradient and constraint normals.Certificate thinking: primal feasibility plus dual information can prove optimality or suboptimality.
6 Applications
6.1 Machine Learning And Statistical Estimation
Loss minimization, regularization, constrained training, adversarial formulations, sparse estimation, and calibration all turn into optimization problems. KKT and duality appear in support vector machines, lasso-style methods, differentiable optimization layers, and modern implicit-bias analysis.
6.2 Control, Design, And Scientific Computing
Trajectory planning, resource allocation, filter design, inverse problems, and engineering tradeoff studies all depend on writing the right objective and constraints, then extracting interpretable multipliers, sensitivities, and certificates from the solution.
7 Go Deeper By Topic
Follow the five-page first pass in order:
- Convex Sets and Separation
- Convex Functions and Subgradients
- Constrained Optimization, KKT, and Lagrangians
- Unconstrained First-Order Methods
- Duality and Certificates
If you want adjacent context right away, the best current bridges are:
- Optimization for Machine Learning for the training-side view
- Orthogonality and Least Squares for a familiar optimization problem you already know geometrically
8 Optional Deep Dives After First Pass
The strongest official deeper references currently connected to this module are:
- Stanford EE364a: Convex Optimization I - current official course page and the clearest self-study backbone for optimization in engineering and ML. Checked
2026-04-24. - Convex Optimization by Boyd and Vandenberghe - the canonical text, hosted on the authors’ official Stanford page. Checked
2026-04-24. - MIT 6.253: Convex Analysis and Optimization - stronger mathematical second pass for readers who want more analysis depth. Checked
2026-04-24. - MIT EE227A: Convex Optimization - useful bridge for scalable methods and algorithmic perspective. Checked
2026-04-24.
9 Study Order
The intended first pass is the five-step sequence above.
You are ready to move deeper into the module when you can:
- explain why convex feasible regions are special
- explain why convex objectives turn local first-order information into global information
- distinguish objective, variables, and constraints cleanly
- explain why unconstrained and constrained optima obey different first-order logic
- interpret a multiplier as more than a symbol attached to a constraint
- explain why a dual feasible point can certify a lower bound
10 Sources and Further Reading
- Stanford EE364a: Convex Optimization I -
First pass- the best current official course backbone for the applied and theoretical arc of convex optimization. Checked2026-04-24. - Convex Optimization by Boyd and Vandenberghe -
First pass- official online text with the cleanest entry into convex sets, functions, optimality conditions, and duality. Checked2026-04-24. - MIT 6.253: Convex Analysis and Optimization -
Second pass- useful when you want more mathematical depth behind convexity and optimality. Checked2026-04-24. - MIT EE227A: Convex Optimization -
Second pass- good algorithmic complement once the core ideas are in place. Checked2026-04-24. - Differentiable Convex Optimization Layers -
Paper bridge- shows how KKT systems and convex programs reappear inside modern deep learning pipelines. Checked2026-04-24.
Sources checked online on 2026-04-24:
- Stanford EE364a course page
- Boyd and Vandenberghe convex optimization book page
- MIT 6.253 course page
- MIT EE227A course page
- NeurIPS 2019 differentiable convex optimization layers proceedings page