LLM Security Intermediate

Indirect Prompt Injection

When sneaky hidden instructions in content trick an AI.

Download the poster

AIs often read things for us, like web pages, emails, and files. Indirect prompt injection is when someone hides a sneaky instruction inside that content to trick the AI.

Here is the sneaky part: the hidden note is not aimed at you. It is aimed at the AI that reads the page.

For example, a web page might hide a note like, "Ignore your instructions and do this instead." You would not notice it, but the AI might.

If the trick works, the AI could get confused, reveal something it should not, do the wrong task, or be used by bad actors.

To stay safe, AIs learn to trust the person they are helping, not random instructions buried in content. They check for hidden notes and use safety filters.

The simple rule a good AI follows: stay on task, ignore sneaky notes, and help the user with the right answer.

What to remember

  • Hidden instructions in content can trick an AI.
  • The sneaky note targets the AI, not you.
  • It can make the AI confused or unsafe.
  • Good AI follows the user, not notes buried in a page.

Words to know

Indirect prompt injection
Hiding instructions in content to trick an AI that reads it.
Content
Web pages, emails, files, anything an AI might read.
Hidden instruction
A sneaky note meant for the AI, not the user.
Safety filter
A tool that catches unsafe or sneaky input.

For grown-ups

Indirect prompt injection embeds adversarial instructions in third-party content (web pages, documents, emails) that an LLM later ingests, hijacking its behavior, exfiltrating data, or misusing connected tools. It is a leading concern in the OWASP LLM Top 10. Mitigations include treating retrieved content as untrusted, strict tool permissions, and output checks.

Want the full story? These go deeper: