detector_benchmark.dataset_loader.dataset_loader_utils¶
Functions¶
|
Create a train split from a dataset. We go from Dataset to DatasetDict. |
|
Filter duplicates in the dataset based on the text_field. |
|
Create train, eval and test splits from a dataset. |
Module Contents¶
- detector_benchmark.dataset_loader.dataset_loader_utils.create_train_from_dataset(dataset: datasets.Dataset) datasets.DatasetDict ¶
Create a train split from a dataset. We go from Dataset to DatasetDict.
Parameters:¶
- datasetDataset
The dataset to create the train split from
Returns:¶
- DatasetDict
The dataset with the train split
- detector_benchmark.dataset_loader.dataset_loader_utils.filter_duplicates(dataset: datasets.Dataset, text_field: str) datasets.Dataset ¶
Filter duplicates in the dataset based on the text_field.
Parameters:¶
- dataset: Dataset
The dataset to filter duplicates from
- text_field: str
The field to use for filtering duplicates
Returns:¶
- Dataset
The dataset without duplicates
- detector_benchmark.dataset_loader.dataset_loader_utils.create_splits(dataset: datasets.Dataset, train_size: float, eval_size: float, test_size: float) datasets.DatasetDict ¶
Create train, eval and test splits from a dataset.
Parameters:¶
- dataset: Dataset
The dataset to create the splits from
- train_size: float
The size of the train split
- eval_size: float
The size of the eval split
- test_size: float
The size of the test split
Returns:¶
- DatasetDict
The dataset with the train, eval and test splits