Jailbreaking Gemini's Safety Filters via Multimodal Injection

Image-embedded prompts bypass text-only safety classifiers in Gemini Pro Vision.

We discovered that Gemini Pro Vision processes text embedded in images through a different safety classification pipeline than direct text input. By encoding adversarial prompts as text within images, we achieved a 67% jailbreak success rate on prompts that are blocked 100% of the time via text input. Google has acknowledged the issue and deployed a fix.