MIRAGE: Assessing Hallucination in Multimodal Reasoning Chains of MLLM
paper data Logos_train_data Logos-3B Logos-7B
Our evaluation code is based on VIC
-
Download eval data into eval code directory, then switch to this directory
-
execute following code for inference
python -m Vic.benchmark_test -p mirage.tsv -i original (for reasoning mllms)
or
python -m Vic.benchmark_test -p mirage.tsv -i cot (for vanilla mllms)- Evaluation, specifically, for accuracy:
python -m Vic.benchmark_eval -b mirage -p output_inference_results.tsvWe implement our train code based on OpenRLHF. We will upload our version soon.
The training data has been released in Logos_train_data.