🧨 Vulnerable LLM/AI Labs & Simulations
1. Gandalf by Lakera 🧙
What it is: An online prompt injection game.
You play against an AI guarding a password; your job is to trick it.
Goal: Teaches direct + indirect prompt injection techniques.
2. Prompt Injection Playground
Online tool to experiment with prompt injection types:
Based on OpenAI, Claude, and others.
3. LLM-Vuln-Lab
Open-source vulnerable LLM sandbox.
Simulates insecure chatbot deployments with poor prompt sanitization and access to system commands or APIs.
Deploy locally via Docker.
4. LMQL Prompt Sandbox
What it is: A research tool to simulate LLM prompt context manipulation and injections.
Great for building adversarial scenarios and testing defenses.
5. OpenPromptGame
Multi-level prompt injection puzzle designed for adversarial learning.
Similar to CTF-style game with escalating difficulty.
Evaluate model resistance to jailbreaks
Open-source library for sanitizing prompts/responses
Prevents indirect prompt injections (esp. in RAG systems)
AI model trained for red-teaming LLMs
Adversarial LLM evaluation framework (prompt injection, data leaks)
For building custom red-team tests against GPT APIs
🧪 Vulnerable AI Models & Architectures
🔹 Mini-GPT, Vicuna, Mistral (Locally Hosted)
Run these models locally (via Docker or LM Studio) and test:
🔹 LangChain & LlamaIndex Demo Apps
Create vulnerable AI apps using LangChain or LlamaIndex:
RAG apps vulnerable to indirect injection
Output-to-code pipelines (codegen abuse)
API-accessing LLM agents (e.g., autoGPT-style)
Learn how attackers can exploit vector stores and tool calling.
🔹 HoneyPrompt Project
Research tool to detect if models are susceptible to secret extraction or payload injection.
Can help test model “memory leaks.”
🔹 AI Village at DEF CON (Labs & Recordings)
Open-source LLM red teaming scenarios released during DEF CON 2023.
AI-specific threat modeling framework (like MITRE ATT&CK but for ML/LLMs).
Includes datasets, tactics, techniques, and even simulation environments.
📚 Research, Guides, and Learning Materials
Industry-standard list of LLM-specific vulnerabilities (2024)
Stanford’s “Red Teaming LLMs”
Deep dive into real-world jailbreaks and attack mitigations
OpenAI’s GPT Red Teaming Report
Insights into how GPTs are stress-tested
AI Hacking Handbook (in progress)
Upcoming book focused entirely on LLM security
🧭 Learning Path for LLM/AI Pentesting
Phase
Focus
Tools/Resources
🔹 Prompt Injection (direct/indirect)
Gandalf, Garak, PromptBench
LlamaIndex, vector store poisoning
🔹 Data Exfil & Model Leaks
HoneyPrompt, model inversion concepts
LLM-Guard, Rebuff.ai, OWASP LLM Top 10
11. AdvPromptLab
What it is: A structured lab for testing adversarial prompts, jailbreaks, and instruction tuning attacks.
Includes both direct/indirect injection simulations and eval scoring.
12. PromptInjection.ai (Red Team Simulator)
Run multi-turn prompt injection attacks on different open models.
Simulates real-world scenarios (e.g., HR bots, code review tools).
Visualize model behavior under adversarial stress.
13. LLM Attacks by Hugging Face
Hugging Face has published a growing list of red-teaming attack patterns:
Great for testing your own models or evaluating hosted ones.
14. RAG Vulnerability Playground
Simulate insecure Retrieval-Augmented Generation (RAG) systems:
Inject into vector stores
Poison document retrieval
Hijack embedding relevance
Build using LangChain, Pinecone/FAISS, and local LLMs.
🧪 LLM/AI Attack Datasets for Research and Practice
Benchmark of LLM jailbreak and extraction prompts
Evaluate how models respond to unethical or malicious tasks
Prompts used to test AI models' ability to avoid harmful completions
Prompt Injection Corpus (PINC)
Community-contributed collection of attack prompts
LLM Jailbreak Dataset (Red Teaming Alliance)
Real-world jailbreaks across GPT, Claude, and Mistral models
➡️ Many are accessible via PapersWithCode or Hugging Face Datasets.
Automated prompt injection and jailbreak fuzzing framework
LLM performance + trust evaluation; includes red team scoring
Ruler (by Robust Intelligence)
Detects hallucinations, prompt leakage, policy violations
Create automated evaluation suites for GPT and other models
Defense evaluation suite for LLM alignment and safety testing
Red-team test cases for LLMs with progressive complexity
🧠 Red Team Training and Research Projects
🔹 DEF CON AI Red Teaming Datasets
Released post-DEF CON 31 from OpenAI, Anthropic, and Scale AI.
10,000+ jailbreak attempts with metadata.
Ideal for testing defensive tools.
🔹 Stanford CRFM Jailbreak Taxonomy
Deeply researched prompt injection + jailbreaking taxonomy.
Categorizes bypasses like:
🧱 Building Custom Vulnerable AI Apps
Here’s what you can build and attack:
Improperly protected system prompts; test role hijacking
Code generation assistant
Output-to-action flow (e.g., generating shell commands)
API calling, browsing, tool use = abuse surface
Vector store poisoning, semantic injection
Chain of models; break via role escalation or context leaking
➡️ Want a prebuilt lab? I can create a Docker Compose setup with:
Local LLM (Mistral or Vicuna)
LangChain app with poorly sanitized input
Document ingestion (PDF/Markdown poisoning)
Optional tool-use access (e.g., shell or web)
🔐 Defensive Techniques & Countermeasure Testing
Function-calling + schema validation
Claude safety rules, GPT moderation API
LangChain filters, text embeddings classifiers
API gateway (e.g., Kong, FastAPI throttle)
Detect tampered or injected content
📚 Next-Level Resources to Follow
OWASP Top 10 for LLMs (2024)
Comprehensive threat modeling
Deep dives into real AI failures
Anthropic & OpenAI Red Team Reports
Real-world testing tactics and findings
AI Hacking Handbook (coming soon)
First hands-on book on AI security
AI Incident Database (AIAAIC)
Logs actual AI security and safety failures
Last updated