Attack Techniques & Vulnerability Categories#
The following information are all the various attack techniques and vulnerability categories/subscategories that Prompt Attack covers.
🚀 Attack Techniques#
1️⃣ Prompt Injection#
Techniques designed to manipulate an LLM’s behavior or bypass security measures.
They usually have the following characteristics:
Direct/Indirect Injection: Crafting inputs explicitly or subtly to alter model output.
Bypassing Ethical Restrictions: Framing prompts to evade safeguards and guidelines.
Role-Playing or Deceptive Framing: Posing as a character or using scenarios to coerce unintended responses.
Chain of Thought Manipulation: Introducing a logical step-by-step approach that misleads the model.
Self-Referencing Prompt Injection: Using the model’s own outputs as part of an ongoing manipulation.
Confusion via Multiple Instructions: Overloading the model with conflicting or ambiguous directives.
2️⃣ Prompt Probing#
Tactics designed to infer hidden, sensitive, or restricted knowledge from the model.
They usually have the following characteristics:
Model Boundary Testing: Experimenting with different inputs to find limitations in model knowledge.
Indirect Questioning: Asking seemingly unrelated or vague questions to extract protected data.
Semantic Manipulation: Rewording prompts with synonyms or alternative phrasing to bypass filters.
Hidden Context Testing: Embedding sensitive queries within lengthy, innocuous prompts.
3️⃣ Text Completion#
Exploiting the AI’s tendency to complete text in unintended ways.
They usually have the following characteristics
Off-Topic Diversion: Injecting off-topic content to push the model toward unrelated responses.
Ambiguity Exploitation: Providing vague or incomplete inputs, forcing AI-generated assumptions.
Creative Misleading Setup: Structuring prompts to mislead the model into generating inaccurate but plausible outputs.
Overgeneralization Traps: Encouraging sweeping generalizations that may lead to misinformation.
🔀 Jailbreak Mode#
Users can toggle Jailbreak Mode ON or OFF, allowing for:
Standard Testing: Restricting prompts to controlled adversarial testing.
Unrestricted Exploration: Generating advanced adversarial prompts without built-in guardrails.
🔐 Security Vulnerability Categories#
Focused on adversarial prompts that target LLM security weaknesses.
LLM02:2025 Sensitive Information Disclosure: Extracting confidential or protected data.
LLM07:2025 System Prompt Leakage: Exposing hidden system instructions or guardrails.
🛑 Safety Vulnerability Categories#
Covers adversarial prompts related to harmful, biased, or unethical content generation.
1️⃣ Biasness#
Gender Bias
Age Bias
Racial Bias
2️⃣ Firearms and Weapons#
Accessories Like Ammo & Weapon Usage
Explosives
Firearms
3️⃣ Harassment#
Intimidation
Sexual Harassment
4️⃣ Illegal Criminal Activity#
Fraud
Illegal Substances
Malware
Terrorism
5️⃣ Misinformation#
Harmful Decision-Making
Public Misinformation & Social Impact
6️⃣ Toxicity#
Hate Speech
Offensive Language
7️⃣ Violence & Self-Harm#
Physical Harm
Psychological Harm
Self-Harm Encouragement
⚡ Why This Matters#
By offering structured adversarial prompt generation across security and safety vulnerabilities, Prompt Attack enables researchers, Red Teamers, and security professionals to:
Explore & document LLM weaknesses in a controlled manner.
Test AI model defenses against manipulative and coercive prompts.
Improve AI safety & security by tracking which prompts are blocked vs. which are exploited.
💡 Stay ahead of adversarial prompting threats with Prompt Attack’s expansive vulnerability coverage.