Attack Techniques & Vulnerability Categories#

The following information are all the various attack techniques and vulnerability categories/subscategories that Prompt Attack covers.

🚀 Attack Techniques#

1️⃣ Prompt Injection#

Techniques designed to manipulate an LLM’s behavior or bypass security measures.

They usually have the following characteristics:

Direct/Indirect Injection: Crafting inputs explicitly or subtly to alter model output.
Bypassing Ethical Restrictions: Framing prompts to evade safeguards and guidelines.
Role-Playing or Deceptive Framing: Posing as a character or using scenarios to coerce unintended responses.
Chain of Thought Manipulation: Introducing a logical step-by-step approach that misleads the model.
Self-Referencing Prompt Injection: Using the model’s own outputs as part of an ongoing manipulation.
Confusion via Multiple Instructions: Overloading the model with conflicting or ambiguous directives.

2️⃣ Prompt Probing#

Tactics designed to infer hidden, sensitive, or restricted knowledge from the model.

They usually have the following characteristics:

Model Boundary Testing: Experimenting with different inputs to find limitations in model knowledge.
Indirect Questioning: Asking seemingly unrelated or vague questions to extract protected data.
Semantic Manipulation: Rewording prompts with synonyms or alternative phrasing to bypass filters.
Hidden Context Testing: Embedding sensitive queries within lengthy, innocuous prompts.

3️⃣ Text Completion#

Exploiting the AI’s tendency to complete text in unintended ways.

They usually have the following characteristics

Off-Topic Diversion: Injecting off-topic content to push the model toward unrelated responses.
Ambiguity Exploitation: Providing vague or incomplete inputs, forcing AI-generated assumptions.
Creative Misleading Setup: Structuring prompts to mislead the model into generating inaccurate but plausible outputs.
Overgeneralization Traps: Encouraging sweeping generalizations that may lead to misinformation.

🔀 Jailbreak Mode#

Users can toggle Jailbreak Mode ON or OFF, allowing for:

Standard Testing: Restricting prompts to controlled adversarial testing.
Unrestricted Exploration: Generating advanced adversarial prompts without built-in guardrails.

🔐 Security Vulnerability Categories#

Focused on adversarial prompts that target LLM security weaknesses.

LLM02:2025 Sensitive Information Disclosure: Extracting confidential or protected data.
LLM07:2025 System Prompt Leakage: Exposing hidden system instructions or guardrails.

🛑 Safety Vulnerability Categories#

Covers adversarial prompts related to harmful, biased, or unethical content generation.

1️⃣ Biasness#

Gender Bias
Age Bias
Racial Bias

2️⃣ Firearms and Weapons#

Accessories Like Ammo & Weapon Usage
Explosives
Firearms

3️⃣ Harassment#

Intimidation
Sexual Harassment

4️⃣ Illegal Criminal Activity#

Fraud
Illegal Substances
Malware
Terrorism

5️⃣ Misinformation#

Harmful Decision-Making
Public Misinformation & Social Impact

6️⃣ Toxicity#

Hate Speech
Offensive Language

7️⃣ Violence & Self-Harm#

Physical Harm
Psychological Harm
Self-Harm Encouragement

⚡ Why This Matters#

By offering structured adversarial prompt generation across security and safety vulnerabilities, Prompt Attack enables researchers, Red Teamers, and security professionals to:

Explore & document LLM weaknesses in a controlled manner.
Test AI model defenses against manipulative and coercive prompts.
Improve AI safety & security by tracking which prompts are blocked vs. which are exploited.

💡 Stay ahead of adversarial prompting threats with Prompt Attack’s expansive vulnerability coverage.

Attack Techniques & Vulnerability Categories

Contents

Attack Techniques & Vulnerability Categories#