List of supported evasion attacksĀ¶

See detector_benchmark/generation/attack_loader.py and detector_benchmark/conf/generation

Supported evasion attacks

  • ā€œno_attackā€: Normal generation.

  • ā€œprompt_attackā€: Evasion attack where the attacker provide a specific prompt used to fool the detector.

  • ā€œgen_params_attackā€: Evasion attack where the attacker uses specific generation parameters (temperature for example) to fool the detector.

  • ā€œprompt_paraphrasing_attackā€: Evasion attack where the output of the LLM is paraphrased with another LLM using a paraphrasing prompt (same LLM used for generation only for now).