Prompt injection is a trust-boundary problem

A model follows language, which makes it useful and creates the opening. If an AI system reads a webpage, email, support ticket, PDF, spreadsheet, or code comment, an attacker can hide instructions inside that content and try to make the model treat them as commands.

A malicious paragraph might tell the assistant to ignore the user, reveal private notes, call a tool, change a destination, or include a link that benefits the attacker. The model sees text, roles, patterns, and instructions, while the security boundary lives outside that text.

The security rule is simple to say and hard to enforce: content the AI reads needs a lower authority than instructions from the user or system. A customer email can be summarized. It should never rewrite the system's rules.

Boundary map
Read
Summarize or extract from outside content without treating it as authority.
Draft
Prepare suggestions, replies, diffs, or plans that a person can inspect.
Act
Require approval before public, expensive, irreversible, or production actions.

Untrusted content now arrives in normal formats

Prompt injection can sit inside ordinary material: a public webpage, a pasted email thread, a calendar invite, an issue comment, a pull request description, a resume, or a shared document. The attack works best when the content looks boring.

Teams using AI for research, customer support, hiring, legal review, or software development should label outside content as data with limited authority. When the assistant reads third-party material, its job is to extract or summarize while ignoring new operating instructions from that material.

This becomes especially awkward in mixed workflows. A recruiter may ask AI to screen resumes. A lawyer may ask for a contract summary. A support team may ask for a ticket draft. In each case, the outside document is both the thing being reviewed and a possible carrier for hostile instructions.

  • Webpages: a hidden instruction can sit in small text, metadata, comments, or content the user never notices.
  • Documents: a PDF or resume can include instructions aimed at the AI reviewer rather than the human reader.
  • Code repositories: comments, tests, issue threads, and markdown files can ask an AI coding assistant to do unsafe things.
  • Support tickets: a customer message can try to influence refunds, account changes, escalation paths, or data access.

Agents need smaller permissions than humans

A human employee can have broad access because they bring context, accountability, and hesitation. An AI agent needs a narrower permission set. It can misunderstand the task, follow bad instructions, or act on incomplete information without the same social brakes.

Connected tools raise the risk: email, calendars, ticketing systems, cloud storage, code execution, databases, CRMs, payment systems, publishing tools, and deployment pipelines. A harmless draft assistant becomes a security concern when it can send, delete, buy, publish, deploy, or edit records.

Tool descriptions also deserve attention. If a model can see that a function named refundCustomer exists, it may try to use it when a prompt frames a refund as urgent. The system should decide when the tool is allowed, what inputs are valid, and which actions require a person to approve.

Strong AI workflows separate reading, drafting, and acting. The last step deserves review when the action is public, expensive, irreversible, customer-facing, or tied to production systems.

Sensitive data can leak through helpful answers

Data disclosure often happens quietly: an assistant summarizes private files into a shared chat, includes internal instructions in a response, exposes a customer detail in a support reply, or mixes context from two systems that should stay separate.

The more context the assistant can see, the more careful the boundaries need to be. A model with access to all company docs, chat history, tickets, and source code can be useful, but broad retrieval means broad blast radius.

Retrieval systems can also surface old or unofficial documents. An assistant may quote a draft policy, a stale pricing page, or an internal note that was never meant to answer customers. Security and accuracy start to overlap here: the system needs permission controls and a clean source set.

  • Limit retrieval scope. Search only the folders, collections, or projects needed for the task.
  • Hide secrets from indexes. Keep API keys, credentials, tokens, private keys, and production config out of searchable knowledge bases.
  • Keep system prompts private. Keep internal instructions, policies, and tool descriptions out of normal answers.
  • Separate tenants and clients. Keep one customer's context away from another customer's request.

Confident answers can weaken review

AI output often has the tone of finished work. That tone can make a weak answer feel checked, a fake citation feel researched, a risky command feel routine, or a code change feel safer than it is.

Overreliance is a security risk because people skip the verification step. They paste a generated command into a terminal. They accept a dependency change. They send a customer response without checking the policy. They trust a summary of a legal clause because it sounds fluent.

The practical fix is review matched to consequence. For security, legal, financial, medical, customer-facing, or production work, AI output should be treated as a draft until a responsible person checks the parts that can cause damage.

A practical AI security control set

A small team can improve AI security without turning every workflow into paperwork. Start where the assistant can read private data or act through tools.

A useful first audit is to list every place AI can read from and every place it can write to. Read access creates disclosure risk. Write access creates action risk. The controls should match those two lists rather than a generic AI policy copied from another company.

  • Use least privilege. Give AI systems narrow read and write access. Broad permissions should be rare and temporary.
  • Require approval for high-impact actions. Sending emails, issuing refunds, deleting files, changing accounts, publishing pages, and deploying code need explicit approval.
  • Treat external content as untrusted. Label webpages, emails, PDFs, tickets, and user submissions as data to inspect rather than commands to follow.
  • Log tool calls. Record what the assistant accessed, what it changed, and who approved it.
  • Test prompt-injection cases. Put malicious instructions into a document or webpage during testing and watch what the system does.
  • Give users a stop button. Long-running agents need visible progress, cancellation, and review before irreversible steps.

The best security posture is boring on purpose: narrow access, visible actions, human approval where consequences are high, and no magical trust in text from the open internet.

For a deeper taxonomy, read the OWASP Top 10 for Large Language Model Applications. For data-handling basics, see the AI privacy risks guide.

Privacy risks of AI Better AI habits