Robotics and Embodied Active Inference

Robotics and Embodied Active Inference
The body as computational manifold: embodied active inference in robotics.

Robotics and Embodied Active Inference

Series: Active Inference Applied | Part: 7 of 10

In 1950, W. Grey Walter built a robot turtle named Elsie. She had two vacuum tubes for a brain, two motors, and a photocell that made her seek light. She also had a tendency to get stuck. When her battery ran low, she'd crawl back to her charging station. When she encountered her reflection in a mirror, she'd dance with herself. She looked, to human observers, remarkably alive.

Grey Walter called this "the richness of the uncertain." Elsie wasn't executing a sophisticated program. She was coupled to her environment in a way that produced coherent behavior through continuous sensorimotor loops. She minimized surprise not by planning, but by existing.

This is the insight that active inference brings to robotics: intelligence isn't something you install in a machine and then deploy into the world. It's something that emerges from the right kind of coupling between a body and its environment. And embodiment—the fact that you have a physical form with particular constraints and affordances—fundamentally changes the inference problem.


Why Embodiment Changes Everything

In standard active inference implementations, you build a generative model, define beliefs over states, and minimize expected free energy to select actions. The agent exists as software, interacting with environments through abstract state spaces.

But robots have bodies.

This matters because:

Morphological computation. Your body itself performs computation. A passive dynamic walker—a robot that walks downhill without motors or control systems—solves complex dynamical equations through the physics of its limbs. Its gait emerges from mechanical constraints, not neural commands. The body computes stability.

Proprioception as precision. A robot doesn't just have exteroceptive sensors (cameras, lidar). It has proprioception—knowledge of where its joints are, how much torque is in its actuators, whether it's balanced. This isn't auxiliary information. It's foundational. You can't minimize prediction error about the world without also minimizing prediction error about your own configuration.

Action as belief propagation. In disembodied active inference, actions are discrete choices—move left, pick up object, wait. In embodied systems, action is continuous motor control. Your muscles are always doing something. There's no neutral state. Action becomes a continuous stream of belief updates propagating from cortex to spinal cord to effectors, each level minimizing its own prediction errors.

The real-time constraint. Simulated agents can pause to think. Robots fall over if they stop computing. The control loop runs at 100Hz or faster. There's no time for Monte Carlo tree search. Inference must be fast, approximate, and good enough.

This is what researchers like Pablo Lanillos, Manuel Baltieri, and Cristian Pezzato discovered when they started implementing active inference on actual robots: embodiment isn't just a detail to handle. It's the core of the problem.


The Body as Markov Blanket

Remember Markov blankets—the statistical boundaries that define systems. In active inference, your Markov blanket separates internal states (beliefs, neural activity) from external states (the world) via sensory and active states.

For a robot, the Markov blanket is literal.

Your sensors are the sensory states—the photoreceptors in your camera, the strain gauges in your joints, the IMU reporting your tilt. Your actuators are the active states—motor torques, gripper positions, wheel velocities. Everything inside (your control algorithms, state estimates, model parameters) is separated from everything outside (objects, terrain, other agents) by this physical interface.

But here's where embodiment gets interesting: the blanket itself has dynamics.

When you reach for a cup, your arm's inertia matters. The cup's weight matters. Contact forces propagate through mechanical linkages in ways that take time. Your sensory states don't instantly reflect your active states. There's a delay, and it's not just latency—it's dynamical coupling through physics.

This means the generative model must include the body.

In practice, this looks like:

  • Forward models predicting sensory consequences of motor commands (if I torque this joint 10 Nm, where will my hand be in 100ms?)
  • Inverse models computing motor commands needed to achieve sensory goals (to move my hand there, what torques do I need?)
  • Body schema representing the structure and constraints of the robot's morphology (my arm has 7 DOF, these joint limits, this moment of inertia)

The robot isn't just modeling the world. It's modeling itself modeling the world. Self-awareness isn't a philosophical luxury—it's a computational necessity.


Proprioceptive Inference: Knowing Where You Are

Here's a problem: you're a humanoid robot trying to walk. You have accelerometers, gyroscopes, joint encoders. But sensors lie.

Joint encoders report angles, but joints flex under load. IMUs drift. Vision processing has 50ms lag. You need to know where your foot is now, but every sensor tells you something slightly different about slightly different things at slightly different times.

This is the proprioceptive inference problem, and active inference solves it through predictive processing.

Your generative model includes:

  • Hidden states: Joint angles, velocities, contact forces, center of mass position
  • Observations: Sensor readings (noisy, delayed, incomplete)
  • Beliefs: Posterior distributions over hidden states given observations

You minimize variational free energy to infer your current configuration. But critically, you also use precision weighting—you trust sensors differently in different contexts.

When your foot is in the air, you trust the IMU (gravity tells you which way is down) and joint encoders (kinematics tells you where your limbs are). When your foot contacts the ground, you suddenly trust force sensors more (contact forces constrain possible configurations) and distrust the IMU less (ground reaction forces create noise).

This is exactly what biological systems do. Humans unconsciously upweight vestibular signals when balance is threatened and downweight them during stable sitting. This isn't hardcoded switching—it's learned precision.

Pablo Lanillos's group at Donders Institute implemented this on a humanoid robot. The robot learned which sensors to trust in which phases of walking, without explicit programming. Precision weights emerged from minimizing prediction error. The robot developed a sense of its own body through interaction.


Motor Control as Inference to Causes

In classical robotics, you plan a trajectory (a sequence of joint angles over time) and then execute it with PID controllers that minimize tracking error. This works well for factory robots doing the same motion repeatedly in controlled environments.

It fails catastrophically for anything uncertain.

Active inference reframes motor control: actions are inferences about the causes of desired sensory states.

You don't command "move arm to (x, y, z)." You predict that your hand will be at (x, y, z), and motor commands are whatever makes that prediction come true. The prediction creates attraction in proprioceptive space, and muscle activations flow from minimizing the discrepancy between predicted and actual proprioception.

This is called active inference for motor control, and it's deeply weird until you see it work.

Instead of computing inverse kinematics (given desired end-effector position, what joint angles?), you:

  1. Set a desired proprioceptive state (hand at target)
  2. Predict sensory consequences of that state
  3. Compare predictions to current sensations
  4. Use prediction errors to update motor commands (reflex arcs minimize error fast)
  5. Simultaneously update beliefs about body configuration (perceptual inference)

Action and perception are the same process—both minimize prediction error, just on different timescales and through different paths (via muscles vs via beliefs).

Cristian Pezzato's work at TU Delft showed this on a robotic manipulator. The robot reached for objects without explicit trajectory planning. It had a generative model of its arm dynamics and a belief about where its hand should be. Reflexive controllers at the joint level minimized proprioceptive prediction errors. No inverse kinematics. No motion planner. Just prediction error minimization cascading through hierarchical loops.

It worked even when objects moved during reaching. The target location was a continuously updated belief, not a fixed setpoint. Adaptation was automatic.


Multi-Modal Sensor Fusion: Vision Meets Proprioception

Here's where embodiment really shines: robots have multiple sensor modalities that provide redundant, complementary information.

You see the cup at (x, y, z) in camera space. You feel your hand at joint angles (θ1, θ2, ... θ7). You know from kinematics where those joint angles place your hand in space. Do vision and proprioception agree?

If yes: high confidence in both. If no: something is wrong. Either the vision is mistaken (shadows, occlusion, segmentation error) or proprioception is off (calibration drift, joint flex) or the world isn't what you thought (the cup moved, someone grabbed your arm).

Active inference handles this through multi-modal generative models where different sensory streams predict the same hidden states.

The visual system predicts: "hand at (x, y, z)"
The proprioceptive system predicts: "hand at (x', y', z')"

The discrepancy is surprise. Minimizing surprise means:

  • Update beliefs about hand location (perceptual inference)
  • Update beliefs about sensor reliability (precision learning)
  • Act to bring sensations into alignment (active inference)

This isn't ad-hoc sensor fusion. It's principled inference under a generative model that explains how different sensors relate to common causes.

Manuel Baltieri and others showed that robots with active inference controllers naturally develop cross-modal expectations. A robot that learns to grasp objects comes to predict what tactile sensations should follow visual contact. When it sees its fingers touch a surface, it expects pressure. When pressure doesn't arrive (the object was a hologram, or the gripper malfunctioned), surprise spikes and behavior adapts.

This is the computational structure of bodily self-awareness. You don't just have sensors. You have a coherent model that explains their joint behavior.


Learning While Moving: Online Model Adaptation

Factory robots operate in controlled environments. Mobile robots in the real world encounter endless novelty. The floor is sometimes carpet, sometimes tile, sometimes mud. Objects have unexpected weights. Joints wear down and calibration drifts.

How do you maintain performance when the world keeps changing?

Active inference provides a solution: learn the generative model online, continuously, while acting.

Traditional approaches separate learning (offline, in simulation or during training phases) from deployment (online, with fixed models). This works until reality diverges from your model, at which point performance degrades and you need to retrain.

Active inference agents treat learning as inference over model parameters. You maintain beliefs not just over states (where am I?) but over model structure (how does my body respond to commands? how do objects behave when pushed?).

Mathematically, this is hierarchical inference where:

  • Fast dynamics: Infer states given the current model
  • Slow dynamics: Infer model parameters given accumulated prediction errors

When prediction errors persist (the robot consistently underestimates friction on this surface), parameter beliefs update. The model adapts. Future predictions improve. No retraining phase needed.

This was demonstrated by Alexander Tschantz and colleagues in simulated agents, and by Lanillos's group on physical robots. A robot learning to balance on a beam continuously updated its body schema as it experienced different terrains. Parameters encoding limb inertia, friction coefficients, and actuator response curves drifted toward values that minimized long-run prediction error.

The robot wasn't executing a fixed control policy. It was maintaining a coherent explanation of sensorimotor flow, and that explanation evolved with experience.

This is learning as you move. Not learning, then moving—learning through moving.


The Social Robot Problem: Modeling Other Agents

Most robotics research focuses on interaction with static environments (navigate this room, pick up that object). But the most important environments contain other agents.

How do you minimize surprise in a world that includes humans?

Active inference extends naturally to multi-agent scenarios through nested generative models. You model other agents as active inference agents themselves—as systems with their own beliefs, preferences, and generative models.

This isn't theory of mind as folk psychology. It's theory of mind as Bayesian inference over another agent's latent states.

When you see a human reaching for a door, you infer:

  • Their goal (get through the door)
  • Their belief about door state (closed)
  • Their policy (reach, grasp, pull)
  • Their predictions about your behavior (will I hold it open?)

Your model of them includes their model of you. Social interaction becomes joint inference over coupled generative models.

For robots, this means:

  • Predicting human motion (where will they step?)
  • Inferring human intentions (do they want to collaborate?)
  • Planning legible actions (how do I signal my intent?)
  • Updating beliefs about human preferences from feedback (did they accept my help or seem annoyed?)

This is what researchers call social active inference, and it's crucial for robots that share space with humans. A delivery robot navigating a sidewalk isn't just obstacle-avoiding—it's participating in a distributed dance where everyone predicts everyone else and adjusts accordingly.

Collaborative robots (cobots) in manufacturing use active inference to anticipate worker needs. Instead of waiting for explicit commands, the robot infers task structure from partial demonstrations and offers help at appropriate moments. The human's actions become observations that update the robot's beliefs about shared goals.

The robot isn't obeying. It's inferring.


Failure Modes: When Embodiment Breaks Inference

Active inference isn't magic. Embodied robots fail in characteristic ways that reveal the limits of the framework.

Model misspecification. If your generative model is deeply wrong about body dynamics or environmental structure, no amount of inference will save you. You'll confidently predict the wrong things and act inappropriately. Garbage model, garbage inference.

Example: A legged robot with an incorrect model of ground friction will confidently plan motions that cause slipping. The prediction errors will be huge, but if the model doesn't include friction as a learnable parameter, it can't adapt.

Solution: Expressive models with learnable parameters. But this increases computational cost and data requirements.

Computational intractability. Full Bayesian inference over high-dimensional continuous state spaces is intractable. Real implementations use approximations—variational inference, particle filters, moment matching. These approximations introduce errors that accumulate.

Example: A manipulator with 7 joints, 100 tactile sensors, and visual input has a state space too large for exact inference. Approximations mean the robot's beliefs are systematically biased.

Solution: Hierarchical architectures that factorize the problem. Solve low-dimensional subproblems exactly, coordinate through sparse messages. This is how biological systems scale.

Sensory poverty. Active inference requires rich sensory feedback. If you can't observe the consequences of your actions, you can't minimize prediction error.

Example: A robot gripper without tactile sensors can't tell if it's grasping firmly or crushing the object. It acts blindly.

Solution: Add sensors. But sensors add cost, weight, and computational load. There are diminishing returns.

The real-time trap. Control loops run fast (100Hz-1kHz). Complex inference takes time. There's a fundamental tension between model complexity and real-time performance.

Solution: Use simple models at fast timescales (reflexive controllers) and complex models at slow timescales (deliberative planning). This mirrors cortical hierarchies—brainstem runs fast loops, cortex runs slow loops.


Why This Matters Beyond Robots

The robotics work isn't just about building better machines. It's revealing something fundamental about embodied cognition.

Every insight from robotic active inference applies to biological agents:

You are a body with sensors and actuators. Your eyes, ears, skin, muscles—these are the interface through which you couple to the world. Perception isn't passive reception; it's active inference conditioned on embodiment.

Your brain doesn't just model the world. It models you in the world. Body schema, proprioception, motor imagery—these aren't peripheral features. They're central to the inference problem. You can't predict sensory consequences of actions without modeling your own body.

Learning happens online, continuously. You don't stop, retrain, restart. You update beliefs about yourself and your environment while moving, acting, adapting. Experience is training data.

Social cognition is nested inference. Understanding others means running a model of their model. This isn't metaphor—it's the computational structure of theory of mind.

This is why embodied active inference in robotics matters to anyone thinking about minds. It forces the theory to be precise, implementable, and robust to reality.

When Karl Friston talks about the free energy principle as applying to "any system that maintains its structure over time," robotics is the test. Can you build a system that persists through sensorimotor coupling? Can it adapt to novelty? Can it learn from its own prediction errors?

The answer is increasingly yes. And the systems that succeed look more and more like biological organisms—hierarchical, embodied, predictive, and strange.


From Theory to Steel: Implementations That Work

Let's get concrete. What does embodied active inference look like in practice?

iCub humanoid (Italian Institute of Technology): A child-sized robot with 53 motors, cameras, microphones, tactile sensors across its skin, and proprioceptive feedback from every joint. Researchers implemented active inference for reaching and grasping. The robot learned hand-eye coordination without inverse kinematics. It developed expectations about what surfaces should feel like based on visual appearance. When objects violated expectations (a visually solid block that compressed under pressure), the robot showed surprise-like responses—pausing, exploring more cautiously.

PR2 robot (TU Delft): A mobile manipulator with two 7-DOF arms. Cristian Pezzato's group implemented hierarchical active inference for bimanual manipulation. The robot coordinated both arms to lift awkward objects without explicit coordination code. Each arm minimized its own prediction errors while implicitly predicting the other arm's contributions through shared proprioceptive signals (the object's tilt, weight distribution). Coordination emerged from coupled inference, not from a central coordinator.

Wheeled robots (Donders Institute): Simple differential-drive robots with bump sensors and odometry. Pablo Lanillos's team used these to test the most basic question: can active inference navigate? The robots built spatial maps through simultaneous localization and mapping (SLAM), but treated the map as beliefs updated through active inference. They performed goal-directed navigation in dynamic environments where obstacles moved. The robots adapted online, updating both map beliefs and body models (turning radius, wheel slip) as they moved.

Soft robots (Bristol Robotics Lab): Pneumatic actuators made of silicone, with compliance and flexibility unlike rigid robots. These systems have complex, nonlinear dynamics that are hard to model analytically. Active inference controllers learned dynamics models through interaction. The robots developed body awareness of their own deformable structure—knowing how air pressure in one chamber affected shape in another. They performed gentle manipulation tasks (handling fragile objects, navigating cluttered spaces) that rigid robots struggle with.

The common thread: these robots didn't follow pre-programmed routines. They maintained coherent sensorimotor loops by minimizing prediction error, and they adapted those loops through experience.

They were, in a real sense, learning to exist.


Where Robotics Meets 4E Cognition

This brings us full circle to 4E cognition—the idea that minds are embodied, embedded, enacted, and extended.

Robotics proves the computational necessity of these ideas.

Embodied: Your particular body—its sensors, actuators, morphology, constraints—fundamentally shapes the inference problem. There's no such thing as disembodied intelligence. Even "pure" reasoning is grounded in sensorimotor metaphors learned through bodily interaction.

Embedded: You can't model the world in the abstract. Your model must explain the world as it appears from your particular niche, through your particular sensors, given your particular action repertoire. Models are always situated.

Enacted: Perception isn't passive. You bring forth your world through action. A robot with active inference doesn't just sense—it moves to resolve uncertainty. The world reveals itself through interaction.

Extended: Cognition doesn't stop at the skull (or chassis). The robot-environment boundary is the Markov blanket, and that blanket can shift. When a robot uses a tool, the tool becomes part of the body schema. Proprioception extends into the tool. This isn't metaphor—it's measurable in the precision weights applied to tactile feedback at the tool tip.

Robotics makes 4E cognition testable. You can't handwave about embodiment when you need to specify exactly how joint angles relate to gripper position. You can't be vague about enaction when the robot falls over if motor commands don't align with beliefs.

The theory has to work in steel and silicon. And increasingly, it does.


This is Part 7 of the Active Inference Applied series, exploring how the free energy principle becomes working code. We've moved from implementations to generative models to planning to the computational substrate of message passing, and now to embodiment. Next, we'll look at how active inference scales to complex, hierarchical tasks through temporal abstraction and compositional structure.

Previous: Active Inference Agents vs Reinforcement Learning: A Comparison

Next: Hierarchical Active Inference: Scaling to Complex Tasks


Further Reading

  • Lanillos, P., et al. (2021). "Active Inference in Robotics and Artificial Agents." Neural Computation.
  • Pezzato, C., Ferrari, R., & Corbato, C. H. (2020). "A Novel Adaptive Controller for Robot Manipulators Based on Active Inference." IEEE Robotics and Automation Letters.
  • Baltieri, M., & Buckley, C. L. (2019). "PID Control as a Process of Active Inference with Linear Generative Models." Entropy.
  • Oliver, G., Lanillos, P., & Cheng, G. (2022). "An Empirical Study of Active Inference on a Humanoid Robot." IEEE Transactions on Cognitive and Developmental Systems.
  • Tschantz, A., et al. (2020). "Learning Action-Oriented Models Through Active Inference." PLOS Computational Biology.
  • Pio-Lopez, L., et al. (2016). "Active Inference and Robot Control: A Case Study." Journal of the Royal Society Interface.