ORBIT

ORBIT: Open Web-Reasoning for Information Retrieval Tasks

Features

🌐 4-round data pipeline — turns raw Wikipedia seeds into a verified multi-hop QA dataset through generation, self-verification, and external judging
💸 Zero API spend — Zero dollars spent on expensive Web Search APIs or search-enabled LLMs; the entire pipeline runs on Selenium + DeepSeek Chat and uses search aggregators such as DDGS
🤖 GRPO training with web search — trains search-augmented LLMs via verl-tool with a parallel multi-backend DDGS retrieval server
📊 12-benchmark evaluation suite — covers single-hop (NQ, TriviaQA, PopQA), multi-hop (HotpotQA, 2WikiMHQA, MuSiQue), and hard reasoning (FRAMES, GAIA, MoNaCo) benchmarks

📚 Contents

Data Pipeline

ORBIT QA pairs are built through four sequential rounds. See data/README.md for the full overview.

Round	What it does	Key script
1	Wikipedia category pages → seed titles	`create_seeds.py`
2	Seeds → multi-hop reasoning questions + answers	`deepseek_generate_qa.py`
3	Self-verification via DeepSeek Chat + Selenium	`deepseek_self_verify.py`
4	External verification on scraped documents via vLLM	`external_verification.py`

Dataset Example — TV Shows & Movies domain (click to expand)

Question: What was the exact runtime (minutes) of the 2017 animated feature set inside a smartphone's messaging application, directed by a filmmaker previously known for sequels to popular children's franchises, featuring a protagonist whose facial expression malfunctions, with voice casting that includes a lead actor from a critically acclaimed sitcom, and produced by a studio that later won an Oscar for Spider-Man animation?

Answer: 86 minutes

Verification / Summary of Supporting Facts:

	Clue	Supporting Evidence
✅	Animated Set	The Emoji Movie (2017), set inside a smartphone messaging app world called Textopolis
✅	Filmmaker	Tony Leondis, previously directed sequels: Lilo & Stitch 2, Kung Fu Panda: Secrets of the Masters
✅	Protagonist	Gene, the main character in The Emoji Movie, struggles with malfunctioning facial expressions
✅	Voice Cast	T.J. Miller, lead actor from the HBO sitcom Silicon Valley
✅	Studio	Sony Pictures Animation, which later won an Oscar for Spider-Man: Into the Spider-Verse

💾 Installation

Installing uv itself:

curl -LsSf https://astral.sh/uv/install.sh | sh

uv sync
source .venv/bin/activate

Training/Evaluation Data

Prior to training & evaluating search agents, you should prepare the training, eval and test datasets. See models/data_process for the full data processing scripts.

export HF_HOME=/u3/n3thakur/projects/cache
export HF_TOKEN=hf_...

Training data — ORBIT + NQ + HotpotQA mixed at a 1:1:1 ratio:

python models/data_process/prepare_train_data.py \
    --datasets 'nq,hotpotqa,orbit-ai/orbit-20k' \
    --local_dir models/train/data/mix-nq-hotpotqa-orbit-ratio-1-1-1 \
    --ratio 1:1:1

Eval data — all 12 validation benchmarks:

python models/data_process/prepare_eval_data.py \
    --data_sources nq,triviaqa,popqa,hotpotqa,2wikimultihopqa,musique,bamboogle,frames,gaia,monaco,webwalkerqa,webshaper \
    --local_dir models/eval/data/all-12-val-datasets

Test data — 8 Wikipedia test benchmarks:

python models/data_process/prepare_test_data.py \
    --local_dir models/eval/data/all-8-wikipedia-test-datasets

Training Code

To train search agents using ORBIT, please see models/train.

export HF_TOKEN="hf_..."
export WANDB_API_KEY="..."

bash models/train/ddgs_web_search.sh # run ddgs web server
bash models/train/run_grpo.sh # train search agent with verl-tool

Evaluation Code

To evaluate search agents trained using ORBIT, please see models/eval.

bash models/eval/retrieval_server_bge.sh # run BGE index using faiss (conda env required)
bash models/eval/run_eval.sh # evaluate search agent on Wikipedia datasets

Repository Structure

orbit/
├── data/
│   ├── round-1-seed-creation/         # Wikipedia pages → seed JSONL
│   ├── round-2-qa-generation/         # Seeds → inverted QA pairs (DeepSeek)
│   ├── round-3-self-verification/     # QA pairs → self-verified answers (DeepSeek)
│   ├── round-4-external-verification/ # Verified pairs → externally judged (vLLM)
│   └── outputs/                       # Intermediate + final JSONL files
├── models/
│   ├── train/                         # GRPO training + DDGS retrieval server
│   ├── eval/                          # BGE retrieval index + evaluation
│   └── data_process/                  # Prepare train/eval parquet files
├── pyproject.toml
└── README.md

Acknowledgements

We thank the following open-source projects:

verl-tool for the GRPO training framework for training search agents.
vLLM for LLM inference.
DDGS for a web search aggregator tool (used for search agent training with GRPO).
Search-R1 for retrieval server design inspiration and initial experiments.

Contact

If you have any questions or suggestions, please contact us at:

Nandan Thakur: [email protected]
Zijian Chen: [email protected]
Xueguang Ma: [email protected]

Citation

If you find this data generation repository helpful, please cite our preprint work ORBIT: Scalable and Verifiable Data Generation for Search Agents on a Tight Budget:

@misc{thakur2026orbit,
      title={ORBIT: Scalable and Verifiable Data Generation for Search Agents on a Tight Budget}, 
      author={Nandan Thakur and Zijian Chen and Xueguang Ma and Jimmy Lin},
      year={2026},
      eprint={2604.01195},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2604.01195}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
assets		assets
data		data
models		models
.gitignore		.gitignore
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ORBIT

ORBIT: Open Web-Reasoning for Information Retrieval Tasks

Features

📚 Contents

Data Pipeline

💾 Installation

Training/Evaluation Data

Training Code

Evaluation Code

Repository Structure

Acknowledgements

Contact

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ORBIT

ORBIT: Open Web-Reasoning for Information Retrieval Tasks

Features

📚 Contents

Data Pipeline

💾 Installation

Training/Evaluation Data

Training Code

Evaluation Code

Repository Structure

Acknowledgements

Contact

Citation

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages