Steering Vision-Language-Action Models for Safe Robotic Execution

Core idea

The project explored whether semantic directions in a VLA model’s latent activation space can steer robot behavior at inference time, avoiding costly fine-tuning while preserving task behavior.

Methods

The implementation used contrastive activation vectors, conditional activation steering, and PyTorch forward hooks on mid-to-late transformer layers. The approach was evaluated with a Franka Panda arm using the Pi 0.5 VLA model.

Why it matters

The work highlights a practical path toward interpretable, lightweight behavior modification for robotic policies, especially where retraining is expensive or safety constraints change after deployment.