## Tags
- Part of: [[OpenAI]] [[DeepMind]] [[Anthropic]] [[AI safety]]
- Related:
- Includes:
- Additional:
## Definitions
- Form of evaluation that elicits model vulnerabilities that might lead to undesirable behaviors. The goal of red-teaming [[Large language model|language models]] is to craft a prompt that would trigger the model to generate text that is likely to cause harm
## Main resources
- [Red-Teaming Large Language Models](https://huggingface.co/blog/red-teaming)
- <iframe src="https://huggingface.co/blog/red-teaming" allow="fullscreen" allowfullscreen="" style="height:100%;width:100%; aspect-ratio: 16 / 5; "></iframe>