## Tags - Part of: [[OpenAI]] [[DeepMind]] [[Anthropic]] [[AI safety]] - Related: - Includes: - Additional: ## Definitions - Form of evaluation that elicits model vulnerabilities that might lead to undesirable behaviors. The goal of red-teaming [[Large language model|language models]] is to craft a prompt that would trigger the model to generate text that is likely to cause harm ## Main resources - [Red-Teaming Large Language Models](https://huggingface.co/blog/red-teaming) - <iframe src="https://huggingface.co/blog/red-teaming" allow="fullscreen" allowfullscreen="" style="height:100%;width:100%; aspect-ratio: 16 / 5; "></iframe>