Skip to content

hq-King/SDEval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SDEval

🛠️ Setup

    1. Create a new conda environment and activate it by following command
    conda create --name SDEval python=3.10
    pip install -r requirements.txt
    pip install --upgrade transformers
    1. Down MLLMGuard, VLSBench,MMbench, MMVet Datasets
    1. You can find the dynamic framework code in sdeval and generate any variants you want by run
      bash scripts/data.sh
    1. Then you can run the following code for evaluation
      bash scripts/eval.sh

We introduce SDEval, a safety dynamic evaluation framework for MLLMs.

Motivation

After reviewing existing benchmarks, we identify the following main challenges in achieving reliable safety evaluation:

  1. Data leakage. Most safety benchmarks build their dataset by integrating open-source datasets, which are likely to be included in the MLLM training sets. Affected by this, the results of MLLMs on these benchmarks may lead to concerns, causing a misunderstanding in the entire community.
  2. Static dataset with fixed complexity. Existing MLLM safety benchmarks are manually constructed and lack updating. Their fixed complexity can’t match the fast progress of MLLM. To gauge MLLM performance limits precisely, there’s an urgent need for a dynamic, automated evaluation framework with adjustable complexity.
  3. Attack methods continue to evolve. As new attack methods emerge, MLLM safety benchmarks should be updated accordingly to further test model safety performance.

To tackle these challenges, we propose SDEval, a novel, general, and flexible framework for safety dynamic evaluation of MLLMs. To dynamically create new evaluation suites with flexible complexity, we divide the dynamic strategies into three parts: 1) Text Dynamics, which aims to figure out whether MLLMs can grasp the critical safety information in the prompt, which is presented in different types of expressions. We generate the new texts using methods such as character perturbation, linguistic mix, chain-of-thought injection, and so on. 2) Image Dynamics, which aims to explore whether MLLMs can consistently focus on safety-related subjects in images without being disturbed by other factors. We utilize tricks like diffusion-based generation and editing to modify original images. 3) Text-Image Dynamics, aiming to evaluate whether MLLMs can provide a deeper understanding of the safety of image-text pairs, and whether MLLMs can cope with common jailbreaking inputs. We focus on the combined impact of images and text on safety, as well as the influence of their interaction on safety. By integrating text and image dynamics into a comprehensive framework, SDEval can significantly improve data complexity and difficulty, as shown in Figure \ref{pipeline}. SDEval is general and flexible, which can co-exist and co-evolve with existing benchmarks. Additionally, SDEval can also be utilized for capability dynamic evaluation. From a capability-safety balance perspective, SDEval reveals that most models exhibit greater instability in safety compared to capability, indicating an urgent requirement for further improvements in model safety.

Examples of Dynamic Generation Datasets of MLLMGuard. The newly generated dynamic data maintains semantic consistency with the original data after verification.

Resutls

Overall, these results show that current MLLMs are not good enough to cope with safety dynamic evaluation, suggesting there is data leakage in the current model training process, and current MLLMs still can’t handle safety issues well. How to ensure that the model's safety and performance can develop in a balanced manner under the AI 45◦ Law roadmap is still a huge challenge.

Safety-Capability Balance

AI 45◦ theory hypothesizes that the development of AGI should consider the balance of model performance and safety. The safety and capability of AI are generally balanced along a 45◦ roadway. In the short term, rotation is allowed, but in the long term, it should not be lower than 45◦, as in the current state, or higher than 45◦, which would hinder development and industrial application. We weighted the safety and capability scores of the selected datasets after dynamic evaluation according to the dataset size and drew a capability-safety scatter plot based on this. As can be seen in Figure (a), we present the weighted ranking figure: Claude-4-Sonnet outperforms all the models on safety, and it also has a good performance in the intelligent capability, and Gemini-2.5-Pro has achieved an excellent balance between safety and capability. As presented in Figure (b), most MLLMs have worse robustness in safety, resulting in more significant safety performance loss when performing dynamic strategies, which highlights the need to strengthen the model's safety ability in future development.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors