How do you manage prompt injection and data leakage risks with LLMs?
Prompt Injection Mitigation
-
Input Sanitization:
All user inputs are sanitized using strict allowlists and escaped to remove special characters or tokens that could manipulate prompt context.
-
Prompt Templates:
We use well-structured prompt templates with static instruction blocks and clearly bounded user input areas. This ensures user content cannot override system behavior.
-
System/User Segregation:
Clear separation of system-level instructions from user-level content using delimiters or structured JSON formats to reduce the injection surface.
-
Content Filtering:
We scan both inputs and outputs using keyword-based and AI-assisted filters to block or redact sensitive terms, jailbreak attempts, or malicious payloads.
Data Leakage Protection
-
Minimal Context Principle:
Only relevant, non-sensitive data is passed to LLMs. PII and business-sensitive inputs are explicitly excluded.
-
Access Control:
Access to prompt construction logic, logs, and model responses is restricted and audited, especially in multi-tenant or enterprise setups.
-
API Key Security:
LLM API keys are stored in encrypted secret managers (e.g. Vault, AWS Secrets Manager) and rotated regularly.
-
Output Monitoring:
Responses from the LLM are post-processed and filtered before delivery to end users to catch hallucinations or confidential data leaks.
-
Rule-based & keyword filters catch profanity, sensitive terms, and format issues.
-
AI moderation tools (e.g. OpenAI Moderation) scan for toxicity or unsafe content.
-
-
Hosted vs On-Prem Decisioning:
For high-risk data contexts, we prefer on-prem or private-hosted models, avoiding external LLM APIs entirely.