📈 Learning Dexterous Manipulation with Quantized Hand State

Authors: Ying Feng*, Hongjie Fang*, Yinong He*, Jingjing Chen, Chenxi Wang, Zihao He, Ruonan Liu, Cewu Lu

🛫 Getting Started

💻 Installation

Please follow the installation guide to install the rise and vq-vae conda environments and the dependencies, as well as the real robot environments. Also, remember to adjust the constant parameters in dataset/constants.py and utils/constants.py according to your own environment.

Make sure that TRANS_MIN/MAX and WORKSPACE_MIN/MAX are correctly set in the camera coordinates, or you may obtain meaningless output. We recommend expanding TRANS_MIN/MAX by 0.15 - 0.3 meters on both sides of the actual translation range to accommodate spatial data augmentation. You could follow command_train.sh for data visualization and parameter check.

📷 Calibration

Please calibrate the camera(s) with the robot before data collection and evaluation to ensure correct spatial transformations between camera(s) and the robot. Please refer to calibration guide for more details.

🛢️ Data Collection

Please follow the data collection guide to collect data. We provide the sample data for each tasks on Google Drive and Baidu Netdisk (code: 643b). You may need to adjust dataset/pretrain.py to accommodate different data formats. The sample data have the format of

collect_cups
|-- calib/
|   |-- [calib timestamp 1]/
|   |   |-- extrinsics.npy             # extrinsics (camera to marker)
|   |   |-- intrinsics.npy             # intrinsics
|   |   `-- tcp.npy                    # tcp pose of calibration
|   `-- [calib timestamp 2]/           # similar calib structure
`-- train/
    |-- [episode identifier 1]
    |   |-- metadata.json              # metadata
    |   |-- timestamp.txt              # calib timestamp  
    |   |-- cam_[serial_number 1]/    
    |   |   |-- color                  # RGB
    |   |   |   |-- [timestamp 1].png
    |   |   |   |-- [timestamp 2].png
    |   |   |   |-- ...
    |   |   |   `-- [timestamp T].png
    |   |   |-- depth                  # depth
    |   |   |   |-- [timestamp 1].png
    |   |   |   |-- [timestamp 2].png
    |   |   |   |-- ...
    |   |   |   `-- [timestamp T].png
    |   |   |-- lowdim
    |   |   |   |-- tcp.npz             # tcp
    |   |   |   |   |-- [timestamp 1]
    |   |   |   |   |-- [timestamp 2]
    |   |   |   |   |-- ...
    |   |   |   |   |-- [timestamp T]
    |   |   |   |-- pos.npz             # finger pose
    |   |   |   |   |-- [timestamp 1]
    |   |   |   |   |-- [timestamp 2]
    |   |   |   |   |-- ...
    |   |   |   |   `-- [timestamp T]
    |   `-- cam_[serial_number 2]/     # similar camera structure
    `-- [episode identifier 2]         # similar episode structure

🧑🏻‍💻 Training

Please follow the training guide when working with this codebase.
The guide provides a step-by-step example for Task 1: Pull Tissue, including:

Preprocessing the dataset
Training the VQ-VAE
Training VQ-Rise

This covers the full workflow from raw data to model training.

🤖 Evaluation

Here we provide the sample real-world evaluation code based on the hardwares (Flexiv Rizon 4 robotic arms, OyMotion RoHand, Intel RealSense camera). For other hardware settings, please follow the deployment guide to modify the evaluation script.

Modify the arguments in scripts/command_eval_rise_vae.sh, then

conda activate rise
bash command_eval_rise_vae.sh

Here are the argument explanations in the training process:

--ckpt [ckpt_path]: the checkpoint to be evaluated.
--calib [calib_dir]: the calibration directory.
--num_inference_step [Ni]: how often to perform a policy inference, measured in steps.
--max_steps [Nstep]: maximum steps for an evaluation.
--num_action[Nstep]: number of steps predicted each time.
--vae_codebook: the path of vqvae codebook.
--robot_ip: the robot arm ip address.
--robot_serial_number: the robot serial number of robot arm.
--com_port: the communicate port of dexterous hand.
--camera_ids: the camera ids.
--vis: set to enable open3d visualization after every inference. This visualization is blocking, it will prevent the evaluation process from continuing.
--ensemble_mode [mode]: the temporal ensemble mode.
- [mode] = "new": use the newest predicted action in this step.
- [mode] = "old": use the oldest predicted action in this step.
- [mode] = "avg": use the average of the predicted actions in this step.
- [mode] = "act": use the aggregated weighted average of predicted actions in this step. The weights are set following ACT.
- [mode] = "hato": use the aggregated weighted average of predicted actions in this step. The weights are set following HATO.
The other arguments remain the same as in the training script.

🙏 Acknowledgement

Our codebase is built upon RISE.
Our diffusion module is adapted from Diffusion Policy. This part is under MIT License.
Our transformer module is adapted from ACT, which used DETR in their implementations. The DETR part is under APACHE 2.0 License.
Our Minkowski ResNet observation encoder is adapted from the examples of the MinkowskiEngine repository. This part is under MIT License.
Our temporal ensemble implementation is inspired by the recent HATO project.

✍️ Citation

@article{feng2025learning,
  title       = {Learning Dexterous Manipulation with Quantized Hand State},
  author      = {Feng, Ying and Fang, Hongjie and He, Yinong and Chen, Jingjing and Wang, Chenxi and He, Zihao and Liu, Ruonan and Lu, Cewu},
  journal     = {arXiv preprint arXiv:2509.17450},
  year        = {2025}
}

@inproceedings{wang2024rise,
    title     = {RISE: 3D Perception Makes Real-World Robot Imitation Simple and Effective},
    author    = {Wang, Chenxi and Fang, Hongjie and Fang, Hao-Shu and Lu, Cewu},
    booktitle = {2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)}, 
    year      = {2024},
    pages     = {2870-2877},
    doi       = {10.1109/IROS58592.2024.10801678}}
}

📃 License

DQ-RISE is licensed under CC BY-NC-SA 4.0

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
dataset		dataset
device		device
policy		policy
scripts		scripts
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
data_filter.py		data_filter.py
eval_agent_2cam.py		eval_agent_2cam.py
eval_rise_vae.py		eval_rise_vae.py
eval_rise_vae_2cam.py		eval_rise_vae_2cam.py
eval_vqvae.py		eval_vqvae.py
preprocess_data.py		preprocess_data.py
process_pointcloud.py		process_pointcloud.py
requirements.txt		requirements.txt
train_dqrise.py		train_dqrise.py
train_vqvae.py		train_vqvae.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📈 Learning Dexterous Manipulation with Quantized Hand State

🛫 Getting Started

💻 Installation

📷 Calibration

🛢️ Data Collection

🧑🏻‍💻 Training

🤖 Evaluation

🙏 Acknowledgement

✍️ Citation

📃 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

📈 Learning Dexterous Manipulation with Quantized Hand State

🛫 Getting Started

💻 Installation

📷 Calibration

🛢️ Data Collection

🧑🏻‍💻 Training

🤖 Evaluation

🙏 Acknowledgement

✍️ Citation

📃 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages