Learning, Identification, and RL Bridges

A bridge page showing how control changes when models come from data, why system identification differs from reinforcement learning, and where modern RL reconnects with planning and control.
Modified

April 26, 2026

Keywords

system identification, reinforcement learning, model-based RL, learning-based control, planning

1 Application Snapshot

Once a reader sees state, feedback, estimation, planning, and constraints, the next question is usually:

where exactly does learning enter?

In modern systems work, learning can enter in at least three different places:

  • before control, by learning a model from data
  • inside control, by learning residuals, uncertainty, or planning components
  • as the control policy itself, as in many RL pipelines

This page is the shortest bridge for sorting those possibilities cleanly instead of lumping them all together as “AI control.”

2 Problem Setting

Classical control often starts from a model that is already written down:

\[ x_{t+1} = f(x_t, u_t). \]

But real systems rarely hand you a perfect model for free.

Instead, you may have:

  • logged input-output data
  • a rough physics model with missing effects
  • simulation data but limited real-world data
  • a reward or task specification without an accurate dynamics model

That opens three different routes:

  1. System identification Learn a model, then design the controller.

  2. Learning-based control Use learned components inside the control loop.

  3. Reinforcement learning Learn behavior from interaction, often through rewards over trajectories.

3 Why This Math Appears

This language reuses several math layers already on the site:

  • Control and Dynamics: state-space structure still organizes what the system is doing
  • Stochastic Control and Dynamic Programming: Bellman-style sequential decision ideas reappear in many RL setups
  • Signal Processing and Estimation: learning from data and operating under uncertainty both depend on filtering, noise models, and hidden-state reasoning
  • Optimization: fitting models, improving policies, and replanning all become optimization problems
  • Learning Theory: once models are learned from data, sample efficiency, uncertainty, and distribution shift start to matter

So this page is not a detour away from control. It is the place where control starts touching modern data-driven practice directly.

4 Math Objects In Use

  • trajectory data
  • learned dynamics model
  • learned residual or disturbance model
  • policy \(\pi\)
  • reward or cost over trajectories
  • value function or planner
  • uncertainty about model quality

The fastest organizing principle is:

  • system identification learns the model
  • learning-based control learns part of the pipeline
  • reinforcement learning learns behavior from interaction

5 A Small Worked Walkthrough

Imagine a quadrotor whose nominal physics model is known, but whose aerodynamic drag and battery effects are not calibrated well.

There are at least three distinct ways learning could enter:

  1. System identification Collect flight data, fit a better dynamics model, then run LQR or MPC on that learned model.

  2. Learning-based control Keep the nominal model, but learn a residual correction term that improves prediction inside the planner.

  3. Reinforcement learning Optimize a policy directly from rollouts using a trajectory reward, possibly without ever producing an explicit identified model.

These routes can look similar from far away, but they answer different questions:

  • what is the model?
  • how should I use the model?
  • do I even need a model explicitly, or only good behavior?

That is why “learning in control” is not one idea. It is a family of different insert points.

6 Implementation or Computation Note

Three practical questions help sort almost every method quickly:

  1. What is being learned? A model, a residual, a value function, a policy, or an uncertainty estimate?

  2. Where does the data come from? Passive logs, simulator rollouts, online interaction, or a mix?

  3. How are safety and constraints handled while learning? Through hard feasibility checks, soft penalties, shielding, or only post hoc evaluation?

Use these pages as the strongest follow-on support:

7 Failure Modes

  • calling every data-driven control method “reinforcement learning”
  • treating a learned model as trustworthy without checking where the data came from
  • ignoring distribution shift between training rollouts and deployment conditions
  • assuming high reward in simulation automatically means safe real-world behavior
  • forgetting that system identification plus MPC is often a very different workflow from end-to-end policy learning

8 Paper Bridge

9 Sources and Further Reading

Back to top