Request for DPO training script and GRPO/DPO checkpoint release

Thanks for releasing this excellent project and making both the code and checkpoints publicly available!

In your paper *“What Can RL Bring to VLA Generalization? An Empirical Study”*, you compare **PPO**, **GRPO**, and **DPO** for fine-tuning OpenVLA. The repository already includes scripts for PPO and GRPO (including GRPO(s)), which is very helpful.

However, I couldn’t find the script or configuration for reproducing the **DPO fine-tuning** setup, even though DPO results are included in Figure 3b of the paper.

**Would it be possible to release the code (or config/command) used for DPO training?**  
Also, **would it be possible to share the pretrained checkpoints for GRPO and DPO if they are available?**

Thanks again for your great work!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request for DPO training script and GRPO/DPO checkpoint release #5

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Request for DPO training script and GRPO/DPO checkpoint release #5

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions