Thanks for releasing this excellent project and making both the code and checkpoints publicly available!
In your paper “What Can RL Bring to VLA Generalization? An Empirical Study”, you compare PPO, GRPO, and DPO for fine-tuning OpenVLA. The repository already includes scripts for PPO and GRPO (including GRPO(s)), which is very helpful.
However, I couldn’t find the script or configuration for reproducing the DPO fine-tuning setup, even though DPO results are included in Figure 3b of the paper.
Would it be possible to release the code (or config/command) used for DPO training?
Also, would it be possible to share the pretrained checkpoints for GRPO and DPO if they are available?
Thanks again for your great work!
Thanks for releasing this excellent project and making both the code and checkpoints publicly available!
In your paper “What Can RL Bring to VLA Generalization? An Empirical Study”, you compare PPO, GRPO, and DPO for fine-tuning OpenVLA. The repository already includes scripts for PPO and GRPO (including GRPO(s)), which is very helpful.
However, I couldn’t find the script or configuration for reproducing the DPO fine-tuning setup, even though DPO results are included in Figure 3b of the paper.
Would it be possible to release the code (or config/command) used for DPO training?
Also, would it be possible to share the pretrained checkpoints for GRPO and DPO if they are available?
Thanks again for your great work!