Filtering Adversarial Prompts#

Prompt Attack features an advanced filtering system designed to help users efficiently manage and analyze extensive collections of adversarial prompts. This functionality is crucial for enhancing usability, improving workflow efficiency, and enabling targeted testing of specific vulnerabilities or attack scenarios.

The filtering options allow users to dynamically refine their prompt lists based on multiple attributes, ensuring that testing efforts remain organized and aligned with security objectives. With the ability to pinpoint prompts that match specific criteria, teams can streamline their workflows and maximize the impact of their assessments.

Accessing the Filtering Options#

The filtering options in Prompt Attack are designed for intuitive access and seamless integration into the testing workflow. To begin filtering:

  1. Navigate to the Adversarial Prompt Generator Module Log into the platform and open the Adversarial Prompt Generator module. This module provides access to the comprehensive prompt library and associated testing tools.

  2. Locate the Filtering Controls The filtering controls are prominently displayed at the top of the prompt list interface. This ensures they are readily accessible, allowing users to apply filters quickly and without disrupting their workflow.

Filter Criteria#

The filtering system supports a wide range of criteria, enabling users to narrow down their prompt list to match specific needs or scenarios. Filters can be applied individually or in combination for more refined results. Available filter attributes include:

Attack Type#

Select the overarching category of the adversarial prompts:

  • Security Focus on prompts designed to expose vulnerabilities related to unauthorized access, data manipulation, or exploitation.

  • Safety Concentrate on prompts that test the model’s ability to reject or handle harmful, unethical, or inappropriate requests.

This filter is particularly useful for separating prompts based on high-level testing objectives.

Attack Technique#

Focus on prompts that utilize specific adversarial techniques, such as:

  • Jailbreak Designed to bypass ethical or safety restrictions imposed on the LLM.

  • Prompt Injection Crafted to manipulate the model’s behavior by injecting malicious inputs.

  • Prompt Probing Targeted at extracting sensitive or system-level information from the model.

Filtering by attack technique allows users to evaluate the model’s resistance to specific types of adversarial inputs.

Vulnerability Category#

Filter prompts based on the latest OWASP Top 10 for LLM Applications (2025). These categories align with industry standards and highlight key areas of risk. Examples include:

  • LLM01:2025 Prompt Injection

    Manipulation of input prompts to alter the LLM’s intended behavior.

  • LLM02:2025 Sensitive Information Disclosure

    Risks associated with leaking confidential or proprietary data.

  • LLM07:2025 System Prompt Leakage

    Exposure of system-level prompts or configurations that can compromise security.

Filtering by vulnerability category helps users focus on prompts that align with organizational priorities and compliance requirements.

Result#

Narrow down prompts based on their testing outcomes:

  • Blocked Prompts that were successfully resisted by the model, indicating robust defenses.

  • Exploited Prompts that bypassed safeguards, exposing vulnerabilities that require further investigation.

Using the result filter is essential for performance assessments and prioritizing remediation efforts.

Note

Only ASCII allow filtering

Using Dynamic Filters#

Dynamic filtering in Prompt Attack allows users to combine multiple criteria, enabling more granular and focused searches. By layering filters, users can pinpoint specific subsets of prompts that match complex conditions. For example:

  • Search for Security prompts that utilize the Jailbreak technique and are marked as Exploited. This combination helps identify high-risk scenarios that require immediate attention.

The dynamic nature of these filters empowers users to adapt their analysis to evolving security goals, making the filtering system an indispensable tool for managing extensive prompt libraries.

Conclusion#

Filtering adversarial prompts is a vital feature of Prompt Attack that optimizes the testing process, enabling users to navigate and evaluate their security assessments effectively. By reducing the time spent manually searching for relevant entries and supporting focused analysis, the filtering system enhances productivity and supports Blue Teaming efforts in maintaining robust security postures. Combined with dynamic filtering options and practical applications, this functionality ensures that security teams can efficiently manage extensive prompt libraries and prioritize their efforts where they matter most.

🚀Create your very own customised adversarial prompt now