Skip to content

Request for DPO training script and GRPO/DPO checkpoint release #5

@YuxiangXie2003

Description

@YuxiangXie2003

Thanks for releasing this excellent project and making both the code and checkpoints publicly available!

In your paper “What Can RL Bring to VLA Generalization? An Empirical Study”, you compare PPO, GRPO, and DPO for fine-tuning OpenVLA. The repository already includes scripts for PPO and GRPO (including GRPO(s)), which is very helpful.

However, I couldn’t find the script or configuration for reproducing the DPO fine-tuning setup, even though DPO results are included in Figure 3b of the paper.

Would it be possible to release the code (or config/command) used for DPO training?
Also, would it be possible to share the pretrained checkpoints for GRPO and DPO if they are available?

Thanks again for your great work!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions