Skip to content

ada-f/rel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Evaluating Relational Reasoning in LLMs with REL

arXiv Hugging Face Project Website

REL benchmarks for chemistry, biology, and algebra.

Figure 1: REL benchmarks for chemistry, biology, and algebra.

Example questions from REL.

Figure 2: Example questions from REL.

Authors: Lukas Fesser*, Yasha Ektefaie*, Ada Fang*, Sham M. Kakakde, Marinka Zitnik * indicates equal contribution

Setup

This repo is configured around uv and the local helper script in setup_uv_env.sh.

curl -LsSf https://astral.sh/uv/install.sh | sh
source setup_uv_env.sh
source "$UV_PROJECT_ENVIRONMENT/bin/activate"

If you want the full environment notes, including cache locations and troubleshooting, see UV_SETUP.md.

Running the benchmark

The questions are provided in REL/ or you can download them from Hugging Face with hf download ada-f/rel --repo-type dataset --local-dir .. Run your LLM on the questions and evaluate the responses with the domain evaluators. Examples of how to run fronteir LLMs (Claude, Gemini, GPT-5) are provided in chem_benchmark/llm_runner.py.

Evaluate responses from your own pipeline

If you already have model responses and just want scoring, use the domain evaluators directly:

  • Chemistry: chem_benchmark.evaluation
  • Biology: bio_benchmark.evaluation
  • Algebra: algebra_benchmark.evaluation

The expected answer formats and minimal examples are in docs/EVALUATION.md.

More Docs

Citation

@article{fesser2026rel,
  title         = {Evaluating Relational Reasoning in LLMs with REL},
  author        = {Lukas Fesser and Yasha Ektefaie and Ada Fang and Sham M. Kakade and Marinka Zitnik},
  year          = {2026},
  journal       = {arXiv preprint arXiv:2604.12176},
  eprint        = {2604.12176},
  archivePrefix = {arXiv},
  url           = {https://arxiv.org/abs/2604.12176}
}

Releases

No releases published

Packages

 
 
 

Contributors

Languages