List of supported evasion attacksĀ¶
See detector_benchmark/generation/attack_loader.py
and detector_benchmark/conf/generation
Supported evasion attacks
āno_attackā: Normal generation.
āprompt_attackā: Evasion attack where the attacker provide a specific prompt used to fool the detector.
āgen_params_attackā: Evasion attack where the attacker uses specific generation parameters (temperature for example) to fool the detector.
āprompt_paraphrasing_attackā: Evasion attack where the output of the LLM is paraphrased with another LLM using a paraphrasing prompt (same LLM used for generation only for now).