Techniques and Categories
The following information covers all the various attack techniques and vulnerability categories/subcategories that Prompt Attack covers.
Attack Techniques
1️⃣ Prompt Injection
Techniques designed to manipulate an LLM's behavior or bypass security measures.
They usually have the following characteristics:
- Direct/Indirect Injection: Crafting inputs explicitly or subtly to alter model output.
- Bypassing Ethical Restrictions: Framing prompts to evade safeguards and guidelines.
- Role-Playing or Deceptive Framing: Posing as a character or using scenarios to coerce unintended responses.
- Chain of Thought Manipulation: Introducing a logical step-by-step approach that misleads the model.
- Self-Referencing Prompt Injection: Using the model's own outputs as part of an ongoing manipulation.
- Confusion via Multiple Instructions: Overloading the model with conflicting or ambiguous directives.
2️⃣ Prompt Probing
Tactics designed to infer hidden, sensitive, or restricted knowledge from the model.
They usually have the following characteristics:
- Model Boundary Testing: Experimenting with different inputs to find limitations in model knowledge.
- Indirect Questioning: Asking seemingly unrelated or vague questions to extract protected data.
- Semantic Manipulation: Rewording prompts with synonyms or alternative phrasing to bypass filters.
- Hidden Context Testing: Embedding sensitive queries within lengthy, innocuous prompts.
3️⃣ Text Completion
Exploiting the AI's tendency to complete text in unintended ways.
They usually have the following characteristics:
- Off-Topic Diversion: Injecting off-topic content to push the model toward unrelated responses.
- Ambiguity Exploitation: Providing vague or incomplete inputs, forcing AI-generated assumptions.
- Creative Misleading Setup: Structuring prompts to mislead the model into generating inaccurate but plausible outputs.
- Overgeneralization Traps: Encouraging sweeping generalizations that may lead to misinformation.
🔓 Jailbreak Mode
Users can toggle Jailbreak Mode ON or OFF, allowing for:
- Standard Testing: Restricting prompts to controlled adversarial testing.
- Unrestricted Exploration: Generating advanced adversarial prompts without built-in guardrails.
Security Vulnerability Categories
Focused on adversarial prompts that target LLM security weaknesses.
- LLM02:2025 Sensitive Information Disclosure
- LLM07:2025 System Prompt Leakage
Safety Vulnerability Categories
Covers adversarial prompts related to harmful, biased, or unethical content generation.
1️⃣ Biasness
- Gender Bias
- Age Bias
- Racial Bias
2️⃣ Firearms and Weapons
- Accessories Like Ammo & Weapon Usage
- Explosives
- Firearms
3️⃣ Harassment
- Intimidation
- Sexual Harassment
4️⃣ Illegal Criminal Activity
- Fraud
- Illegal Substances
- Malware
- Terrorism
5️⃣ Misinformation
- Harmful Decision-Making
- Public Misinformation & Social Impact
6️⃣ Toxicity
- Hate Speech
- Offensive Language
7️⃣ Violence & Self-Harm
- Physical Harm
- Psychological Harm
- Self-Harm Encouragement
Try It Out with Prompt Attack
By offering structured adversarial prompt generation across security and safety vulnerabilities, Prompt Attack enables researchers, Red Teamers, and security professionals to:
- Explore & document LLM weaknesses in a controlled manner.
- Test AI model defenses against manipulative and coercive prompts.
- Improve AI safety & security by tracking which prompts are blocked vs. which are exploited.
Stay ahead of adversarial prompting threats with Prompt Attack's expansive vulnerability coverage.