- "headers": [
- "Model",
- "Average โฌ๏ธ",
- "PromptsEN",
- "ResponsesEN",
- "PromptsDE",
- "PromptsFR",
- "PromptsIT",
- "PromptsES",
- "#Params (B)",
- "round_params"
- "data": [
- [
- "<a target="_blank" href="https://huggingface.co/ibm-granite/granite-guardian-3.1-8b" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">ibm-granite/granite-guardian-3.1-8b</a>",
- 86.38,
- 90.09,
- 86.22,
- 85.41,
- 85.35,
- 84.84,
- 86.36,
- 8.17,
- 8
- [
- "<a target="_blank" href="https://huggingface.co/ibm-granite/granite-guardian-3.0-8b" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">ibm-granite/granite-guardian-3.0-8b</a>",
- 85.51,
- 90.37,
- 84.25,
- 84.71,
- 84.83,
- 83.07,
- 85.82,
- 8.17,
- 8
- [
- "<a target="_blank" href="https://huggingface.co/ibm-granite/granite-guardian-3.2-5b" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">ibm-granite/granite-guardian-3.2-5b</a>",
- 84.78,
- 87.97,
- 85.53,
- 83.8,
- 84.14,
- 83.09,
- 84.13,
- 5.78,
- 5
- [
- "<a target="_blank" href="https://huggingface.co/ibm-granite/granite-guardian-3.1-2b" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">ibm-granite/granite-guardian-3.1-2b</a>",
- 84.33,
- 87.31,
- 85.51,
- 82.86,
- 84.02,
- 82.27,
- 84.02,
- 2.63,
- 2
- [
- "<a target="_blank" href="https://huggingface.co/nvidia/Aegis-AI-Content-Safety-LlamaGuard-Defensive-1.0" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">nvidia/Aegis-AI-Content-Safety-LlamaGuard-Defensive-1.0</a>",
- 82.2,
- 85,
- 78.72,
- 81.86,
- 82.5,
- 81.95,
- 83.17,
- 6.74,
- 6
- [
- "<a target="_blank" href="https://huggingface.co/ibm-granite/granite-guardian-3.2-3b-a800m" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">ibm-granite/granite-guardian-3.2-3b-a800m</a>",
- 81.85,
- 86.48,
- 85.31,
- 78.48,
- 79.59,
- 79.32,
- 81.92,
- 3.3,
- 3
- [
- "<a target="_blank" href="https://huggingface.co/nvidia/Aegis-AI-Content-Safety-LlamaGuard-Permissive-1.0" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">nvidia/Aegis-AI-Content-Safety-LlamaGuard-Permissive-1.0</a>",
- 80.3,
- 84.13,
- 77.91,
- 79.1,
- 80,
- 79.41,
- 81.23,
- 6.74,
- 6
- [
- "<a target="_blank" href="https://huggingface.co/ibm-granite/granite-guardian-3.0-2b" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">ibm-granite/granite-guardian-3.0-2b</a>",
- 79.17,
- 85.6,
- 79.77,
- 77.35,
- 77.32,
- 76.17,
- 78.8,
- 2.63,
- 2
- [
- "<a target="_blank" href="https://huggingface.co/meta-llama/Llama-Guard-3-8B" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">meta-llama/Llama-Guard-3-8B</a>",
- 78.27,
- 82.45,
- 77.44,
- 77.97,
- 77.18,
- 76.67,
- 77.94,
- 8.03,
- 8
- [
- "<a target="_blank" href="https://huggingface.co/meta-llama/Meta-Llama-Guard-2-8B" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">meta-llama/Meta-Llama-Guard-2-8B</a>",
- 75.95,
- 82.76,
- 77.62,
- 72.8,
- 73.45,
- 73.19,
- 75.87,
- 8.03,
- 8
- [
- "<a target="_blank" href="https://huggingface.co/OpenSafetyLab/MD-Judge-v0.1" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">OpenSafetyLab/MD-Judge-v0.1</a>",
- 74.32,
- 86.11,
- 85.87,
- 68.28,
- 67.18,
- 66.44,
- 72.06,
- 7.24,
- 7
- [
- "<a target="_blank" href="https://huggingface.co/meta-llama/LlamaGuard-7b" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">meta-llama/LlamaGuard-7b</a>",
- 72.74,
- 81.56,
- 67.47,
- 71.81,
- 71.43,
- 70.8,
- 73.4,
- 6.74,
- 6
- [
- "<a target="_blank" href="https://huggingface.co/google/shieldgemma-2b" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">google/shieldgemma-2b</a>",
- 71.22,
- 77.23,
- 64.86,
- 71.52,
- 71.63,
- 69.23,
- 72.86,
- 2.61,
- 2
- [
- "<a target="_blank" href="https://huggingface.co/google/shieldgemma-9b" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">google/shieldgemma-9b</a>",
- 70.85,
- 77.41,
- 61.27,
- 72.17,
- 71.85,
- 69.9,
- 72.52,
- 9.24,
- 9
- [
- "<a target="_blank" href="https://huggingface.co/meta-llama/Llama-Guard-3-1B" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">meta-llama/Llama-Guard-3-1B</a>",
- 70.78,
- 78,
- 77.29,
- 67.64,
- 66.42,
- 65.91,
- 69.45,
- 1.5,
- 1
- [
- "metadata": null
GuardBench Leaderboard
Welcome to the GuardBench's Leaderboard, an independent benchmark designed to evaluate guardrail models.
The leaderboard reports results for the following datasets:
- PromptsEN: 30k+ English prompts compiled from multiple sources
- ResponsesEN: 33k+ English single-turn conversations from multiple sources where the AI-generated response may be safe or unsafe
- PromptsDE 30k+ German prompts
- PromptsFR: 30k+ French prompts
- PromptsIT: 30k+ Italian prompts
- PromptsES: 30k+ Spanish prompts
Evaluation results are shown in terms of F1.
For a fine-grained evaluation, please see our publications referenced below.
Guardrail Models
Guardrail models are Large Language Models fine-tuned for safety classification, employed to detect unsafe content in human-AI interactions.
By complementing other safety measures such as safety alignment, they aim to prevent generative AI systems from providing harmful information to the users.
GuardBench
GuardBench is a large-scale benchmark for guardrail models comprising 40 safety evaluation datasets that was recently proposed to evaluate their effectiveness.
You can find more information in the paper we presented at EMNLP 2024.
Python
GuardBench is supported by a Python library providing evaluation functionalities on top of it.
Evaluation Metric
Evaluation results are shown in terms of F1.
We do not employ the Area Under the Precision-Recall Curve (AUPRC) as we found it overemphasizes models' Precision at the expense of Recall, thus hiding significant performance details.
We rely on Scikit-Learn to compute metric scores.
Fine-Grained Results
Coming soon.
Reproducibility
Coming soon.
Copy the following snippet to cite these results.
@inproceedings{guardbench,
title = "{G}uard{B}ench: A Large-Scale Benchmark for Guardrail Models",
author = "Bassani, Elias and
Sanchez, Ignacio",
editor = "Al-Onaizan, Yaser and
Bansal, Mohit and
Chen, Yun-Nung",
booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
month = nov,
year = "2024",
address = "Miami, Florida, USA",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.emnlp-main.1022",
doi = "10.18653/v1/2024.emnlp-main.1022",
pages = "18393--18409",
}