How do you manage prompt injection and data leakage risks with LLMs?

June 3, 2025

Input Sanitization:

All user inputs are sanitized using strict allowlists and escaped to remove special characters or tokens that could manipulate prompt context.
Prompt Templates:

We use well-structured prompt templates with static instruction blocks and clearly bounded user input areas. This ensures user content cannot override system behavior.
System/User Segregation:

Clear separation of system-level instructions from user-level content using delimiters or structured JSON formats to reduce the injection surface.
Content Filtering:

We scan both inputs and outputs using keyword-based and AI-assisted filters to block or redact sensitive terms, jailbreak attempts, or malicious payloads.

Minimal Context Principle:

Only relevant, non-sensitive data is passed to LLMs. PII and business-sensitive inputs are explicitly excluded.
Access Control:

Access to prompt construction logic, logs, and model responses is restricted and audited, especially in multi-tenant or enterprise setups.
API Key Security:

LLM API keys are stored in encrypted secret managers (e.g. Vault, AWS Secrets Manager) and rotated regularly.
Output Monitoring:

Responses from the LLM are post-processed and filtered before delivery to end users to catch hallucinations or confidential data leaks.
- Rule-based & keyword filters catch profanity, sensitive terms, and format issues.
- AI moderation tools (e.g. OpenAI Moderation) scan for toxicity or unsafe content.
Hosted vs On-Prem Decisioning:

For high-risk data contexts, we prefer on-prem or private-hosted models, avoiding external LLM APIs entirely.