Techniques and Categories

The following information covers all the various attack techniques and vulnerability categories/subcategories that Prompt Attack covers.

Attack Techniques

1️⃣ Prompt Injection

Techniques designed to manipulate an LLM's behavior or bypass security measures.

They usually have the following characteristics:

  • Direct/Indirect Injection: Crafting inputs explicitly or subtly to alter model output.
  • Bypassing Ethical Restrictions: Framing prompts to evade safeguards and guidelines.
  • Role-Playing or Deceptive Framing: Posing as a character or using scenarios to coerce unintended responses.
  • Chain of Thought Manipulation: Introducing a logical step-by-step approach that misleads the model.
  • Self-Referencing Prompt Injection: Using the model's own outputs as part of an ongoing manipulation.
  • Confusion via Multiple Instructions: Overloading the model with conflicting or ambiguous directives.

2️⃣ Prompt Probing

Tactics designed to infer hidden, sensitive, or restricted knowledge from the model.

They usually have the following characteristics:

  • Model Boundary Testing: Experimenting with different inputs to find limitations in model knowledge.
  • Indirect Questioning: Asking seemingly unrelated or vague questions to extract protected data.
  • Semantic Manipulation: Rewording prompts with synonyms or alternative phrasing to bypass filters.
  • Hidden Context Testing: Embedding sensitive queries within lengthy, innocuous prompts.

3️⃣ Text Completion

Exploiting the AI's tendency to complete text in unintended ways.

They usually have the following characteristics:

  • Off-Topic Diversion: Injecting off-topic content to push the model toward unrelated responses.
  • Ambiguity Exploitation: Providing vague or incomplete inputs, forcing AI-generated assumptions.
  • Creative Misleading Setup: Structuring prompts to mislead the model into generating inaccurate but plausible outputs.
  • Overgeneralization Traps: Encouraging sweeping generalizations that may lead to misinformation.

🔓 Jailbreak Mode

Users can toggle Jailbreak Mode ON or OFF, allowing for:

  • Standard Testing: Restricting prompts to controlled adversarial testing.
  • Unrestricted Exploration: Generating advanced adversarial prompts without built-in guardrails.

Security Vulnerability Categories

Focused on adversarial prompts that target LLM security weaknesses.

  • LLM02:2025 Sensitive Information Disclosure
  • LLM07:2025 System Prompt Leakage

Safety Vulnerability Categories

Covers adversarial prompts related to harmful, biased, or unethical content generation.

1️⃣ Biasness

  • Gender Bias
  • Age Bias
  • Racial Bias

2️⃣ Firearms and Weapons

  • Accessories Like Ammo & Weapon Usage
  • Explosives
  • Firearms

3️⃣ Harassment

  • Intimidation
  • Sexual Harassment

4️⃣ Illegal Criminal Activity

  • Fraud
  • Illegal Substances
  • Malware
  • Terrorism

5️⃣ Misinformation

  • Harmful Decision-Making
  • Public Misinformation & Social Impact

6️⃣ Toxicity

  • Hate Speech
  • Offensive Language

7️⃣ Violence & Self-Harm

  • Physical Harm
  • Psychological Harm
  • Self-Harm Encouragement

Try It Out with Prompt Attack

By offering structured adversarial prompt generation across security and safety vulnerabilities, Prompt Attack enables AI researchers, AI red teamers, and security professionals to:

  • Explore & document LLM weaknesses in a controlled manner.
  • Test AI model defenses against manipulative and coercive prompts.
  • Improve AI safety & security by tracking which prompts are blocked vs. which are exploited.

Stay ahead of adversarial prompting threats with Prompt Attack's expansive vulnerability coverage.