Learning, Identification, and RL Bridges
system identification, reinforcement learning, model-based RL, learning-based control, planning
1 Application Snapshot
Once a reader sees state, feedback, estimation, planning, and constraints, the next question is usually:
where exactly does learning enter?
In modern systems work, learning can enter in at least three different places:
- before control, by learning a model from data
- inside control, by learning residuals, uncertainty, or planning components
- as the control policy itself, as in many RL pipelines
This page is the shortest bridge for sorting those possibilities cleanly instead of lumping them all together as “AI control.”
2 Problem Setting
Classical control often starts from a model that is already written down:
\[ x_{t+1} = f(x_t, u_t). \]
But real systems rarely hand you a perfect model for free.
Instead, you may have:
- logged input-output data
- a rough physics model with missing effects
- simulation data but limited real-world data
- a reward or task specification without an accurate dynamics model
That opens three different routes:
System identificationLearn a model, then design the controller.Learning-based controlUse learned components inside the control loop.Reinforcement learningLearn behavior from interaction, often through rewards over trajectories.
3 Why This Math Appears
This language reuses several math layers already on the site:
Control and Dynamics: state-space structure still organizes what the system is doingStochastic Control and Dynamic Programming: Bellman-style sequential decision ideas reappear in many RL setupsSignal Processing and Estimation: learning from data and operating under uncertainty both depend on filtering, noise models, and hidden-state reasoningOptimization: fitting models, improving policies, and replanning all become optimization problemsLearning Theory: once models are learned from data, sample efficiency, uncertainty, and distribution shift start to matter
So this page is not a detour away from control. It is the place where control starts touching modern data-driven practice directly.
4 Math Objects In Use
- trajectory data
- learned dynamics model
- learned residual or disturbance model
- policy \(\pi\)
- reward or cost over trajectories
- value function or planner
- uncertainty about model quality
The fastest organizing principle is:
system identificationlearns the modellearning-based controllearns part of the pipelinereinforcement learninglearns behavior from interaction
5 A Small Worked Walkthrough
Imagine a quadrotor whose nominal physics model is known, but whose aerodynamic drag and battery effects are not calibrated well.
There are at least three distinct ways learning could enter:
System identificationCollect flight data, fit a better dynamics model, then run LQR or MPC on that learned model.Learning-based controlKeep the nominal model, but learn a residual correction term that improves prediction inside the planner.Reinforcement learningOptimize a policy directly from rollouts using a trajectory reward, possibly without ever producing an explicit identified model.
These routes can look similar from far away, but they answer different questions:
- what is the model?
- how should I use the model?
- do I even need a model explicitly, or only good behavior?
That is why “learning in control” is not one idea. It is a family of different insert points.
6 Implementation or Computation Note
Three practical questions help sort almost every method quickly:
What is being learned?A model, a residual, a value function, a policy, or an uncertainty estimate?Where does the data come from?Passive logs, simulator rollouts, online interaction, or a mix?How are safety and constraints handled while learning?Through hard feasibility checks, soft penalties, shielding, or only post hoc evaluation?
Use these pages as the strongest follow-on support:
7 Failure Modes
- calling every data-driven control method “reinforcement learning”
- treating a learned model as trustworthy without checking where the data came from
- ignoring distribution shift between training rollouts and deployment conditions
- assuming high reward in simulation automatically means safe real-world behavior
- forgetting that system identification plus MPC is often a very different workflow from end-to-end policy learning
8 Paper Bridge
- 6.435 / System Identification -
First pass- official MIT anchor for the model-learning side of data-driven control. Checked2026-04-25. - AA228 / Decision Making Under Uncertainty -
Paper bridge- useful once planning, uncertainty, and RL-style decision making begin to overlap. Checked2026-04-25.
9 Sources and Further Reading
- 6.435 / System Identification -
First pass- official MIT course hub for learning dynamics from data. Checked2026-04-25. - 6.435 lecture notes -
First pass- direct official notes for the identification viewpoint. Checked2026-04-25. - 6.435 syllabus -
First pass- concise framing of what identification is trying to solve. Checked2026-04-25. - AA203 / Optimal and Learning-Based Control -
Second pass- official Stanford anchor for modern learning-based control language. Checked2026-04-25. - EE365 / Stochastic Control -
Second pass- official Stanford course bridge from sequential decision theory toward RL-style viewpoints. Checked2026-04-25. - EE365 lecture slides -
Second pass- useful when you want the Bellman and stochastic-control side to feel more concrete. Checked2026-04-25. - AA228 / Decision Making Under Uncertainty -
Bridge outward- a good Stanford bridge once control, planning, and RL are all in play. Checked2026-04-25.