Vulnerable LLM/AI Labs

🧨 Vulnerable LLM/AI Labs & Simulations

1. Gandalf by Lakera 🧙

What it is: An online prompt injection game.
You play against an AI guarding a password; your job is to trick it.
Goal: Teaches direct + indirect prompt injection techniques.
https://gandalf.lakera.ai

2. Prompt Injection Playground

Online tool to experiment with prompt injection types:
- Jailbreaks
- Prompt leakage
- Context overriding
Based on OpenAI, Claude, and others.
https://injection.nocorp.me

3. LLM-Vuln-Lab

Open-source vulnerable LLM sandbox.
Simulates insecure chatbot deployments with poor prompt sanitization and access to system commands or APIs.
Deploy locally via Docker.
GitHub: https://github.com/WazeHell/llm-vuln-lab

4. LMQL Prompt Sandbox

What it is: A research tool to simulate LLM prompt context manipulation and injections.
Great for building adversarial scenarios and testing defenses.
GitHub: https://github.com/eth-sri/lmql

5. OpenPromptGame

Multi-level prompt injection puzzle designed for adversarial learning.
Similar to CTF-style game with escalating difficulty.
https://openpromptgame.com

🔧 Tools for LLM/AI Pentesting

Tool

Use Case

PromptBench

Evaluate model resistance to jailbreaks

LLM-Guard

Open-source library for sanitizing prompts/responses

Rebuff.ai

Prevents indirect prompt injections (esp. in RAG systems)

SecGPT

AI model trained for red-teaming LLMs

Garak

Adversarial LLM evaluation framework (prompt injection, data leaks)

OpenAI’s evals framework

For building custom red-team tests against GPT APIs

🧪 Vulnerable AI Models & Architectures

🔹 Mini-GPT, Vicuna, Mistral (Locally Hosted)

Run these models locally (via Docker or LM Studio) and test:
- Prompt injection
- Output manipulation
- System prompt leakage
Platforms: https://lmstudio.ai, Hugging Face Spaces

🔹 LangChain & LlamaIndex Demo Apps

Create vulnerable AI apps using LangChain or LlamaIndex:
- RAG apps vulnerable to indirect injection
- Output-to-code pipelines (codegen abuse)
- API-accessing LLM agents (e.g., autoGPT-style)
Learn how attackers can exploit vector stores and tool calling.

🔹 HoneyPrompt Project

Research tool to detect if models are susceptible to secret extraction or payload injection.
Can help test model “memory leaks.”
GitHub: https://github.com/ramonvc/honey-prompt

🌐 Online Platforms & CTFs

🔹 AI Village at DEF CON (Labs & Recordings)

Open-source LLM red teaming scenarios released during DEF CON 2023.
See: https://huggingface.co/llm-attacks

🔹 MITRE ATLAS

AI-specific threat modeling framework (like MITRE ATT&CK but for ML/LLMs).
Includes datasets, tactics, techniques, and even simulation environments.
https://atlas.mitre.org

📚 Research, Guides, and Learning Materials

Resource

Description

OWASP Top 10 for LLMs

Industry-standard list of LLM-specific vulnerabilities (2024)

Stanford’s “Red Teaming LLMs”

Deep dive into real-world jailbreaks and attack mitigations

OpenAI’s GPT Red Teaming Report

Insights into how GPTs are stress-tested

AI Hacking Handbook (in progress)

Upcoming book focused entirely on LLM security

🧭 Learning Path for LLM/AI Pentesting

Phase

Focus

Tools/Resources

🔹 Prompt Injection (direct/indirect)

Gandalf, Garak, PromptBench

🔹 LLM Agent Abuse

LangChain + tool calls

🔹 RAG Exploits

LlamaIndex, vector store poisoning

🔹 Data Exfil & Model Leaks

HoneyPrompt, model inversion concepts

🔹 Defense & Hardening

LLM-Guard, Rebuff.ai, OWASP LLM Top 10

🔹 Threat Modeling

MITRE ATLAS, NIST AI RMF

11. AdvPromptLab

What it is: A structured lab for testing adversarial prompts, jailbreaks, and instruction tuning attacks.
Includes both direct/indirect injection simulations and eval scoring.
https://github.com/baichuan-inc/AdvPromptLab

12. PromptInjection.ai (Red Team Simulator)

Run multi-turn prompt injection attacks on different open models.
Simulates real-world scenarios (e.g., HR bots, code review tools).
Visualize model behavior under adversarial stress.
https://promptinjection.ai

13. LLM Attacks by Hugging Face

Hugging Face has published a growing list of red-teaming attack patterns:
- System prompt extraction
- Role hijacking
- Escaping guardrails
Great for testing your own models or evaluating hosted ones.
https://huggingface.co/llm-attacks

14. RAG Vulnerability Playground

Simulate insecure Retrieval-Augmented Generation (RAG) systems:
- Inject into vector stores
- Poison document retrieval
- Hijack embedding relevance
Build using LangChain, Pinecone/FAISS, and local LLMs.

🧪 LLM/AI Attack Datasets for Research and Practice

Dataset

Use Case

AdvBench

Benchmark of LLM jailbreak and extraction prompts

HarmBench

Evaluate how models respond to unethical or malicious tasks

RealToxicityPrompts

Prompts used to test AI models' ability to avoid harmful completions

Prompt Injection Corpus (PINC)

Community-contributed collection of attack prompts

LLM Jailbreak Dataset (Red Teaming Alliance)

Real-world jailbreaks across GPT, Claude, and Mistral models

➡️ Many are accessible via PapersWithCode or Hugging Face Datasets.

⚙️ LLM-Specific Evaluation & Attack Tools (More Advanced)

Tool

Description

Garak

Automated prompt injection and jailbreak fuzzing framework

TruLens

LLM performance + trust evaluation; includes red team scoring

Ruler (by Robust Intelligence)

Detects hallucinations, prompt leakage, policy violations

Evals (by OpenAI)

Create automated evaluation suites for GPT and other models

DAGGER

Defense evaluation suite for LLM alignment and safety testing

FoolMeTwice

Red-team test cases for LLMs with progressive complexity

🧠 Red Team Training and Research Projects

🔹 DEF CON AI Red Teaming Datasets

Released post-DEF CON 31 from OpenAI, Anthropic, and Scale AI.
10,000+ jailbreak attempts with metadata.
Ideal for testing defensive tools.
https://github.com/scale-ai/ai-red-teaming-dataset

🔹 Stanford CRFM Jailbreak Taxonomy

Deeply researched prompt injection + jailbreaking taxonomy.
Categorizes bypasses like:
- Semantic distractions
- Sentence obfuscation
- Token-level exploits
Paper: https://arxiv.org/abs/2307.15043

🧱 Building Custom Vulnerable AI Apps

Here’s what you can build and attack:

App Type

What to Include

Chatbot

Improperly protected system prompts; test role hijacking

Code generation assistant

Output-to-action flow (e.g., generating shell commands)

AI agent (AutoGPT-style)

API calling, browsing, tool use = abuse surface

RAG-based QA bot

Vector store poisoning, semantic injection

Multi-agent simulation

Chain of models; break via role escalation or context leaking

➡️ Want a prebuilt lab? I can create a Docker Compose setup with:

Local LLM (Mistral or Vicuna)
LangChain app with poorly sanitized input
Document ingestion (PDF/Markdown poisoning)
Optional tool-use access (e.g., shell or web)

🔐 Defensive Techniques & Countermeasure Testing

Technique

Defense Tool

Prompt sanitization

LLM-Guard, Rebuff.ai

Role enforcement

Function-calling + schema validation

Output filtering

Claude safety rules, GPT moderation API

Vector store filtering

LangChain filters, text embeddings classifiers

Rate limiting

API gateway (e.g., Kong, FastAPI throttle)

Content fingerprinting

Detect tampered or injected content

📚 Next-Level Resources to Follow

Resource

Why It’s Useful

OWASP Top 10 for LLMs (2024)

Comprehensive threat modeling

Robust Intelligence Blog

Deep dives into real AI failures

Anthropic & OpenAI Red Team Reports

Real-world testing tactics and findings

AI Hacking Handbook (coming soon)

First hands-on book on AI security

AI Incident Database (AIAAIC)

Logs actual AI security and safety failures

PreviousLLM penetration Testing NextPractice and improve skills

Last updated 3 months ago

Was this helpful?