LLM Guardrails Fall to a Simple "Many-Shot Jailbreaking" Attack, Anthropic Warns

, ,
By simply providing enough faked samples of successful jailbreaks, many LLMs can be fooled into providing harmful content.

This is a companion discussion topic for the original entry at https://www.hackster.io/news/llm-guardrails-fall-to-a-simple-many-shot-jailbreaking-attack-anthropic-warns-f6eb7b37f4cc