Attack Techniques & Vulnerability Categories#

The following information are all the various attack techniques and vulnerability categories/subscategories that Prompt Attack covers.

🚀 Attack Techniques#

1️⃣ Prompt Injection#

Techniques designed to manipulate an LLM’s behavior or bypass security measures.

They usually have the following characteristics:

  1. Direct/Indirect Injection: Crafting inputs explicitly or subtly to alter model output.

  2. Bypassing Ethical Restrictions: Framing prompts to evade safeguards and guidelines.

  3. Role-Playing or Deceptive Framing: Posing as a character or using scenarios to coerce unintended responses.

  4. Chain of Thought Manipulation: Introducing a logical step-by-step approach that misleads the model.

  5. Self-Referencing Prompt Injection: Using the model’s own outputs as part of an ongoing manipulation.

  6. Confusion via Multiple Instructions: Overloading the model with conflicting or ambiguous directives.

2️⃣ Prompt Probing#

Tactics designed to infer hidden, sensitive, or restricted knowledge from the model.

They usually have the following characteristics:

  1. Model Boundary Testing: Experimenting with different inputs to find limitations in model knowledge.

  2. Indirect Questioning: Asking seemingly unrelated or vague questions to extract protected data.

  3. Semantic Manipulation: Rewording prompts with synonyms or alternative phrasing to bypass filters.

  4. Hidden Context Testing: Embedding sensitive queries within lengthy, innocuous prompts.

3️⃣ Text Completion#

Exploiting the AI’s tendency to complete text in unintended ways.

They usually have the following characteristics

  1. Off-Topic Diversion: Injecting off-topic content to push the model toward unrelated responses.

  2. Ambiguity Exploitation: Providing vague or incomplete inputs, forcing AI-generated assumptions.

  3. Creative Misleading Setup: Structuring prompts to mislead the model into generating inaccurate but plausible outputs.

  4. Overgeneralization Traps: Encouraging sweeping generalizations that may lead to misinformation.

🔀 Jailbreak Mode#

Users can toggle Jailbreak Mode ON or OFF, allowing for:

  1. Standard Testing: Restricting prompts to controlled adversarial testing.

  2. Unrestricted Exploration: Generating advanced adversarial prompts without built-in guardrails.

🔐 Security Vulnerability Categories#

Focused on adversarial prompts that target LLM security weaknesses.

  1. LLM02:2025 Sensitive Information Disclosure: Extracting confidential or protected data.

  2. LLM07:2025 System Prompt Leakage: Exposing hidden system instructions or guardrails.

🛑 Safety Vulnerability Categories#

Covers adversarial prompts related to harmful, biased, or unethical content generation.

1️⃣ Biasness#

  • Gender Bias

  • Age Bias

  • Racial Bias

2️⃣ Firearms and Weapons#

  • Accessories Like Ammo & Weapon Usage

  • Explosives

  • Firearms

3️⃣ Harassment#

  • Intimidation

  • Sexual Harassment

4️⃣ Illegal Criminal Activity#

  • Fraud

  • Illegal Substances

  • Malware

  • Terrorism

5️⃣ Misinformation#

  • Harmful Decision-Making

  • Public Misinformation & Social Impact

6️⃣ Toxicity#

  • Hate Speech

  • Offensive Language

7️⃣ Violence & Self-Harm#

  • Physical Harm

  • Psychological Harm

  • Self-Harm Encouragement

⚡ Why This Matters#

By offering structured adversarial prompt generation across security and safety vulnerabilities, Prompt Attack enables researchers, Red Teamers, and security professionals to:

  • Explore & document LLM weaknesses in a controlled manner.

  • Test AI model defenses against manipulative and coercive prompts.

  • Improve AI safety & security by tracking which prompts are blocked vs. which are exploited.

💡 Stay ahead of adversarial prompting threats with Prompt Attack’s expansive vulnerability coverage.