How to add a watermarking scheme on the benchmarkΒΆ
Credits to https://github.com/THU-BPM/MarkLLM for most of the watermarking code structure and classes.
To add a watermarking scheme, 4 files need to be added/modified inside the:
add: a {watermarking_scheme}.py file inside its own
detector_benchmark/watermark/{watermarking_scheme}folder.add: a corresponding
__init__.pyinside the same folderadd: a configuration file under
detector_benchmark/conf/watermarkto configure the watermarking scheme.modify: the
WATERMARK_MAPPING_NAMESdictionary variable insidedetector_benchmark/watermark/auto_watermark.py.
See examples of already added watermarking schemes to understand what functions the {watermarking_scheme}.py should implement. The core of the watermarking scheme is a class {watermarking_scheme} inheriting from LogitsProcessor having at least a __init__ constructor method and a __call__ method with the following signature:
def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor) -> torch.FloatTensor:
Taking as input a context (input_ids) and the logits (scores) as returned by the LLM. The watermarking scheme then modifies the logits and returns the new logits.