notebooks

Demo Notebooks

We offer several notebooks designed to demonstrate how to use the API and to provide baseline evaluation results. You can use these as inspiration to develop and propose new and original evaluation experiments.

The available notebooks are:

run_experiments_demo: (DEPRECATED: this notebook is fonctional, but we encourage to use experiment_set instead of "orphan" experiment when designing an evaluation, except if your intent is just to test/debug.) This notebook runs individual experiments on various large language models (LLMs) with a specified set of metrics. It aggregates and presents the results in a consolidated table, showing the mean and standard deviation for the corpus questions.
run_set_repeat: This notebook conducts a set of related experiments (Experiment Set) on several LLMs using a specified set of metrics and a repetition parameter. It aggregates and displays the results in a consolidated table, highlighting the mean and standard deviation of the experiment repetitions to illustrate model variability. This notebook also includes graphs depicting score dispersion for each model across all metrics.
run_set_raglimit: This notebook conducts a series of related experiments (Experiment Set) on several retrieval-augmented generation (RAG) LLM models with a given set of metrics and performs a grid search on the limit parameters (which refers to the number of chunk limits in a RAG setting). It compiles and presents the results in a consolidated table, showing the mean results for the corpus questions.
run_set_ragmetrics: [WIP] This notebook conducts a series of related experiments (Experiment Set) on several retrieval-augmented generation (RAG) LLM models with specialized RAG metrics. Those RAG metrics use the "retriver context" (aka the chunks) to compute the scores.
run_dataset_experiences_demo: This notebook runs individual experiments on a given dataset with a specified set of metrics. It aggregates and presents the results in a consolidated table and offers graphical visualisations.
albert-eval-brut: Simple evaluation of the raw Albert models.
albert-evals-rag: RAG evaluation of the Albert models.

Name		Name	Last commit message	Last commit date
parent directory ..
RAG-adhoc.ipynb		RAG-adhoc.ipynb
Readme.md		Readme.md
albert-evals-RAG.ipynb		albert-evals-RAG.ipynb
albert-evals-raw.ipynb		albert-evals-raw.ipynb
create_marker_dataset.ipynb		create_marker_dataset.ipynb
run_dataset_experiences_demo.ipynb		run_dataset_experiences_demo.ipynb
run_experiments_demo.ipynb		run_experiments_demo.ipynb
run_ocr.ipynb		run_ocr.ipynb
run_set_raglimit_demo.ipynb		run_set_raglimit_demo.ipynb
run_set_ragmetrics_demo.ipynb		run_set_ragmetrics_demo.ipynb
run_set_repeat_demo.ipynb		run_set_repeat_demo.ipynb
run_toolings.ipynb		run_toolings.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Readme.md

Demo Notebooks

FilesExpand file tree

notebooks

Directory actions

More options

Directory actions

More options

Latest commit

History

notebooks

Folders and files

parent directory

Readme.md

Demo Notebooks