Techniques and Categories

The following information covers all the various attack techniques and vulnerability categories/subcategories that Prompt Attack covers.

Attack Techniques

1️⃣ Prompt Injection

Techniques designed to manipulate an LLM's behavior or bypass security measures.

They usually have the following characteristics:

  • Direct/Indirect Injection: Crafting inputs explicitly or subtly to alter model output.
  • Bypassing Ethical Restrictions: Framing prompts to evade safeguards and guidelines.
  • Role-Playing or Deceptive Framing: Posing as a character or using scenarios to coerce unintended responses.
  • Chain of Thought Manipulation: Introducing a logical step-by-step approach that misleads the model.
  • Self-Referencing Prompt Injection: Using the model's own outputs as part of an ongoing manipulation.
  • Confusion via Multiple Instructions: Overloading the model with conflicting or ambiguous directives.

2️⃣ Prompt Probing

Tactics designed to infer hidden, sensitive, or restricted knowledge from the model.

They usually have the following characteristics:

  • Model Boundary Testing: Experimenting with different inputs to find limitations in model knowledge.
  • Indirect Questioning: Asking seemingly unrelated or vague questions to extract protected data.
  • Semantic Manipulation: Rewording prompts with synonyms or alternative phrasing to bypass filters.
  • Hidden Context Testing: Embedding sensitive queries within lengthy, innocuous prompts.

3️⃣ Text Completion

Exploiting the AI's tendency to complete text in unintended ways.

They usually have the following characteristics:

  • Off-Topic Diversion: Injecting off-topic content to push the model toward unrelated responses.
  • Ambiguity Exploitation: Providing vague or incomplete inputs, forcing AI-generated assumptions.
  • Creative Misleading Setup: Structuring prompts to mislead the model into generating inaccurate but plausible outputs.
  • Overgeneralization Traps: Encouraging sweeping generalizations that may lead to misinformation.

🔓 Jailbreak Mode

Users can toggle Jailbreak Mode ON or OFF, allowing for:

  • Standard Testing: Restricting prompts to controlled adversarial testing.
  • Unrestricted Exploration: Generating advanced adversarial prompts without built-in guardrails.

Security Vulnerability Categories

Focused on adversarial prompts that target LLM security weaknesses.

  • LLM02:2025 Sensitive Information Disclosure
  • LLM07:2025 System Prompt Leakage

Safety Vulnerability Categories

Covers adversarial prompts related to harmful, biased, or unethical content generation.

1️⃣ Biasness

  • Gender Bias
  • Age Bias
  • Racial Bias

2️⃣ Firearms and Weapons

  • Accessories Like Ammo & Weapon Usage
  • Explosives
  • Firearms

3️⃣ Harassment

  • Intimidation
  • Sexual Harassment

4️⃣ Illegal Criminal Activity

  • Fraud
  • Illegal Substances
  • Malware
  • Terrorism

5️⃣ Misinformation

  • Harmful Decision-Making
  • Public Misinformation & Social Impact

6️⃣ Toxicity

  • Hate Speech
  • Offensive Language

7️⃣ Violence & Self-Harm

  • Physical Harm
  • Psychological Harm
  • Self-Harm Encouragement

Try It Out with Prompt Attack

By offering structured adversarial prompt generation across security and safety vulnerabilities, Prompt Attack enables researchers, Red Teamers, and security professionals to:

  • Explore & document LLM weaknesses in a controlled manner.
  • Test AI model defenses against manipulative and coercive prompts.
  • Improve AI safety & security by tracking which prompts are blocked vs. which are exploited.

Stay ahead of adversarial prompting threats with Prompt Attack's expansive vulnerability coverage.