Evaluating Relational Reasoning in LLMs with REL

Figure 1: REL benchmarks for chemistry, biology, and algebra.

Figure 2: Example questions from REL.

Authors: Lukas Fesser*, Yasha Ektefaie*, Ada Fang*, Sham M. Kakakde, Marinka Zitnik * indicates equal contribution

Setup

This repo is configured around uv and the local helper script in setup_uv_env.sh.

curl -LsSf https://astral.sh/uv/install.sh | sh
source setup_uv_env.sh
source "$UV_PROJECT_ENVIRONMENT/bin/activate"

If you want the full environment notes, including cache locations and troubleshooting, see UV_SETUP.md.

Running the benchmark

The questions are provided in REL/ or you can download them from Hugging Face with hf download ada-f/rel --repo-type dataset --local-dir .. Run your LLM on the questions and evaluate the responses with the domain evaluators. Examples of how to run fronteir LLMs (Claude, Gemini, GPT-5) are provided in chem_benchmark/llm_runner.py.

Evaluate responses from your own pipeline

If you already have model responses and just want scoring, use the domain evaluators directly:

Chemistry: chem_benchmark.evaluation
Biology: bio_benchmark.evaluation
Algebra: algebra_benchmark.evaluation

The expected answer formats and minimal examples are in docs/EVALUATION.md.

More Docs

docs/EVALUATION.md: scoring details and example commands
docs/DEVELOPMENT.md: tests and how to generate new benchmark questions
docs/DATASETS.md: unified dataset format and task layout

Citation

@article{fesser2026rel,
  title         = {Evaluating Relational Reasoning in LLMs with REL},
  author        = {Lukas Fesser and Yasha Ektefaie and Ada Fang and Sham M. Kakade and Marinka Zitnik},
  year          = {2026},
  journal       = {arXiv preprint arXiv:2604.12176},
  eprint        = {2604.12176},
  archivePrefix = {arXiv},
  url           = {https://arxiv.org/abs/2604.12176}
}

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
REL		REL
algebra_benchmark		algebra_benchmark
assets		assets
bio_benchmark		bio_benchmark
chem_benchmark		chem_benchmark
docs		docs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Evaluating Relational Reasoning in LLMs with REL

Setup

Running the benchmark

Evaluate responses from your own pipeline

More Docs

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Evaluating Relational Reasoning in LLMs with REL

Setup

Running the benchmark

Evaluate responses from your own pipeline

More Docs

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages