Attackers craft deceptive prompts designed to override an AI assistant's instructions and make it ignore safety guardrails, revealing hidden information or performing unintended actions. This works because large language models can be tricked into treating user input as commands rather than filtering it through their intended behavior rules.
Prompt injection is a cyberattack technique where malicious instructions are embedded in content that an AI assistant reads, causing the AI to perform unintended actions such as leaking private data or sending unauthorized messages on your behalf.
As AI assistants become integrated into email, calendars, and personal workflows, understanding prompt injection risks helps users and security tools identify when an AI has been hijacked and prevent sensitive information from being silently exfiltrated.
Peri can explain this concept, give practical examples, help you decide whether it applies to your situation, or recommend a journey if appropriate.
Explore related journeys or tell Peri what you're working through.