Highlight 1 Title
Add a one-sentence description of highlight 1.
Building on this corpus together with broad real-robot pre-training data, we present LabVLA, a VLA pipeline for connecting written laboratory protocols to embodied robot execution in simulated scientific workspaces. LabVLA pairs protocol-conditioned data synthesis with FAST action-token pre-training and flow-matching post-training under a shared cross-embodiment schema.

We introduce RoboGenesis, a simulation-based workflow and data engine that links environment construction, configured workflow generation, domain randomization, and success-filtered export to produce laboratory demonstrations that existing robot corpora rarely cover. We use this engine to synthesize LabEmbodied-Data, a corpus of multi-camera observations, language instructions, robot states, action trajectories, and structured annotations under a shared cross-embodiment schema.
Specifically, LabVLA adapts a Qwen3-VL backbone to map visual observations, robot state, and language instructions into continuous action chunks through a DiT action expert. The model is trained in two stages: FAST action tokens first align the visual-language prefix with action semantics during VLM pre-training, and flow matching then predicts continuous robot actions during post-training. A knowledge-insulation design reduces interference between language-grounded VLM representations and the continuous-action expert during post-training.
An end-to-end multi-arm data engine — spanning tasks, workflows, randomization, assets, and scene generation.
Add a one-sentence description of highlight 1.
Add a one-sentence description of highlight 2.
Add a one-sentence description of highlight 3.
A suite of atomic manipulation tasks can be demonstrated independently or composed into complete lab workflows.
Figure: add caption here.
Atomic skills compose into long-horizon lab procedures across dual-arm manipulation, mobile navigation, and multi-step liquid handling.
We randomize scene appearance, camera viewpoint, object layout, obstacles, and tabletop conditions to improve robustness.
Add a 3-5 line method description covering the core idea, inputs/outputs, and key innovations.
Generate reusable objects, containers, tools, and scene props for composing lab environments.












Scene 01 / 04
Stage 07 / 07
The same viewpoint shows how room structure, furniture, equipment, assets, materials, safety cues, and task execution elements are added step by step.
| Method | Metric 1 | Metric 2 | Metric 3 | Average |
|---|---|---|---|---|
| Baseline 1 | Value | Value | Value | Value |
| Baseline 2 | Value | Value | Value | Value |
| Ours | Value | Value | Value | Value |
LabVLA adapts a Qwen3-VL backbone with FAST action-token pre-training, then a flow-matching DiT action expert — coupled by a stop-gradient that keeps language grounding intact.

We first tokenize continuous actions with FAST and train the VLM under next token supervision, so the prefix learns to predict action tokens before the DiT is attached. In this stage we do not instantiate the DiT.
The second stage therefore loads the VLM pretrained checkpoint, attaches the DiT action expert, and trains it with a flow matching objective that maps Gaussian noise to a clean action chunk through a deterministic vector field. At sampling time the deterministic vector field reaches a usable trajectory in only N=10 Euler steps, well below the hundreds needed by diffusion policies and fast enough for closed loop laboratory control.
We therefore insulate the VLM from the flow loss while keeping the FAST and annotation token losses active, so the prefix can still learn from cross-entropy supervision without receiving velocity space gradients from the action expert. Knowledge insulation is a training time mechanism that blocks flow matching gradients from reaching the VLM prefix while FAST and annotation losses remain active.
Six laboratory operations under in-distribution (ID) and out-of-distribution (OOD) settings, compared against representative VLA baselines on LabUtopia.
| Method | Size | Pick Up | Press Button | Open Door | Pour Liquid | Heat Beaker | Transport Beaker | Avg. |
|---|---|---|---|---|---|---|---|---|
| In-Distribution | ||||||||
| SmolVLA | <1B | 15.8 | 97.5 | 16.7 | 0.8 | 96.7 | 85.8 | 52.2 |
| X-VLA | <1B | 27.5 | 98.3 | 65.0 | 45.0 | 25.8 | 83.3 | 57.5 |
| GR00T N1.5 | 3B | 40.8 | 99.2 | 6.7 | 0 | 99.2 | 69.2 | 52.5 |
| π0 | 3B | 21.7 | 92.5 | 51.6 | 37.5 | 90.0 | 86.7 | 63.3 |
| π0.5 | 3B | 38.0 | 60.0 | 55.8 | 29.2 | 40.8 | 90.0 | 52.3 |
| π0-FAST | 3B | 16.7 | 37.5 | 17.5 | 5.8 | 3.3 | 20.8 | 16.9 |
| InternVLA-A1 | 3B | 25.8 | 93.3 | 38.3 | 2.5 | 82.5 | 67.5 | 51.7 |
| Wall-oss-flow | 4B | 11.7 | 54.2 | 0.83 | 0 | 0 | 29.2 | 16.0 |
| LabVLA | 4B | 49.4 | 100 | 65.0 | 43.3 | 83.3 | 85.8 | 71.1 |
| Out-of-Distribution | ||||||||
| SmolVLA | <1B | 11.7 | 99.2 | 18.3 | 1.67 | 98.3 | 89.2 | 53.1 |
| X-VLA | <1B | 27.5 | 99.2 | 59.2 | 25.0 | 39.2 | 67.5 | 52.9 |
| GR00T N1.5 | 3B | 33.3 | 92.5 | 8.3 | 0 | 99.2 | 66.7 | 50.0 |
| π0 | 3B | 19.2 | 89.1 | 53.3 | 38.3 | 90.8 | 88.3 | 63.2 |
| π0.5 | 3B | 30.0 | 68.3 | 59.2 | 29.2 | 40.0 | 85.8 | 52.1 |
| π0-FAST | 3B | 14.2 | 45.0 | 15.8 | 7.5 | 11.7 | 24.2 | 19.7 |
| InternVLA-A1 | 3B | 19.2 | 95.8 | 63.3 | 0.83 | 84.2 | 57.5 | 53.5 |
| Wall-oss-flow | 4B | 7.5 | 61.7 | 0 | 0 | 0 | 26.7 | 16.0 |
| LabVLA | 4B | 48.3 | 98.3 | 65.8 | 34.2 | 87.5 | 85.8 | 70.0 |
A study beyond LabUtopia: an external X-VLA baseline also benefits from fine-tuning on LabEmbodied-Data — the supervision is not tied to the LabVLA architecture.
| Method | Size | Pick Up | Open Door | Pour Liquid | Heat Beaker | Transport Beaker | Avg. | Δ |
|---|---|---|---|---|---|---|---|---|
| In-Distribution | ||||||||
| X-VLA | <1B | 27.5 | 65.0 | 45.0 | 25.8 | 83.3 | 49.3 | — |
| X-VLA + LabEmbodied | <1B | 26.7 | 69.2 | 59.2 | 68.3 | 98.3 | 64.3 | +15.0 |
| Out-of-Distribution | ||||||||
| X-VLA | <1B | 27.5 | 59.2 | 25.0 | 39.2 | 67.5 | 43.7 | — |
| X-VLA + LabEmbodied | <1B | 31.7 | 63.3 | 65.0 | 65.0 | 90.0 | 63.0 | +19.3 |
Five non-saturated LabUtopia tasks (Press Button excluded as near-saturated for all baselines). Δ is the change in five-task average from adding LabEmbodied-Data.

| Task | Setting | LabVLA (Ours) | DreamZero | π0.5 |
|---|---|---|---|---|
| Shake Liquid | In-domain · Clean | 92 | 90 | 92 |
| In-domain · Cluttered | 86 | 84 | 80 | |
| Out-of-domain · Clean | 84 | 84 | 82 | |
| Out-of-domain · Cluttered | 80 | 80 | 78 | |
| Pour Liquid | In-domain · Clean | 86 | 88 | 82 |
| In-domain · Cluttered | 78 | 80 | 74 | |
| Out-of-domain · Clean | 76 | 72 | 74 | |
| Out-of-domain · Cluttered | 72 | 70 | 68 | |
| Magnetic Stir | In-domain · Clean | 88 | 86 | 88 |
| In-domain · Cluttered | 80 | 84 | 80 | |
| Out-of-domain · Clean | 80 | 78 | 82 | |
| Out-of-domain · Cluttered | 74 | 80 | 76 | |
| Stopper Plug / Unplug | In-domain · Clean | 80 | 84 | 78 |
| In-domain · Cluttered | 76 | 76 | 72 | |
| Out-of-domain · Clean | 80 | 78 | 70 | |
| Out-of-domain · Cluttered | 70 | 72 | 64 | |
| Average | In-domain · Clean | 86.5 | 87.0 | 85.0 |
| In-domain · Cluttered | 80.0 | 81.0 | 76.5 | |
| Out-of-domain · Clean | 80.0 | 78.0 | 77.0 | |
| Out-of-domain · Cluttered | 74.0 | 75.5 | 71.5 |
Rather than a single aggregate score, laboratory manipulation is better viewed through four levels of competence modeled on real laboratory roles. We position LabVLA at Level 2 (Technician), with RoboGenesis infrastructure that begins to support Level 3.
Level 1 (Apprentice) covers single step interactions with laboratory objects: grasping labware, pressing a button, opening a door, or placing a container.
Level 2 (Technician) requires following a written multistep protocol through physical state changes such as pouring, heating, stirring, shaking, or transporting a vessel, where a failed earlier step cascades through the rest of the procedure.
Level 3 (Specialist) adds operation of precision instruments (pipettes, centrifuges, thermal cyclers, microscopes) in longer workflows with measurement logging and safety constraints.
Level 4 (Scientist) modifies the procedure in response to observations or measurements: adjusting concentrations, branching to alternative protocols, or deciding when an experimental objective has been met.
The institutions behind LabVLA.
This work is jointly conducted by the following institutions