detector_benchmark.dataset_loader.dataset_loader_utils¶

Functions¶

create_train_from_dataset(→ datasets.DatasetDict)

Create a train split from a dataset. We go from Dataset to DatasetDict.

filter_duplicates(→ datasets.Dataset)

Filter duplicates in the dataset based on the text_field.

create_splits(→ datasets.DatasetDict)

Create train, eval and test splits from a dataset.

Module Contents¶

detector_benchmark.dataset_loader.dataset_loader_utils.create_train_from_dataset(dataset: datasets.Dataset) datasets.DatasetDict¶

Create a train split from a dataset. We go from Dataset to DatasetDict.

Parameters:¶

datasetDataset

The dataset to create the train split from

Returns:¶

DatasetDict

The dataset with the train split

detector_benchmark.dataset_loader.dataset_loader_utils.filter_duplicates(dataset: datasets.Dataset, text_field: str) datasets.Dataset¶

Filter duplicates in the dataset based on the text_field.

Parameters:¶

dataset: Dataset

The dataset to filter duplicates from

text_field: str

The field to use for filtering duplicates

Returns:¶

Dataset

The dataset without duplicates

detector_benchmark.dataset_loader.dataset_loader_utils.create_splits(dataset: datasets.Dataset, train_size: float, eval_size: float, test_size: float) datasets.DatasetDict¶

Create train, eval and test splits from a dataset.

Parameters:¶

dataset: Dataset

The dataset to create the splits from

train_size: float

The size of the train split

eval_size: float

The size of the eval split

test_size: float

The size of the test split

Returns:¶

DatasetDict

The dataset with the train, eval and test splits