sparseml.pytorch.datasets package

Submodules

sparseml.pytorch.datasets.generic module

class sparseml.pytorch.datasets.generic.CacheableDataset(original: torch.utils.data.dataset.Dataset)[source]

Bases: torch.utils.data.dataset.Dataset

Generates a cacheable dataset, ie stores the data in a cache in cpu memory so it doesn’t have to be loaded from disk every time.

Note, this can only be used with a data loader that has num_workers=0

Parameters

original – the original dataset to cache

class sparseml.pytorch.datasets.generic.EarlyStopDataset(original: torch.utils.data.dataset.Dataset, early_stop: int)[source]

Bases: torch.utils.data.dataset.Dataset

Dataset that handles applying an early stop when iterating through the dataset IE will allow indexing between [0, early_stop)

Parameters
  • original – the original dataset to apply an early stop to

  • early_stop – the total number of data items to run through, if -1 then will go through whole dataset

class sparseml.pytorch.datasets.generic.NoisyDataset(original: torch.utils.data.dataset.Dataset, intensity: float)[source]

Bases: torch.utils.data.dataset.Dataset

Add random noise from a standard distribution mean(0) and stdev(intensity) on top of a dataset

Parameters
  • original – the dataset to add noise on top of

  • intensity – the level of noise to add (creates the noise with this standard deviation)

class sparseml.pytorch.datasets.generic.RandNDataset(length: int, shape: Union[int, Tuple[int, ]], normalize: bool)[source]

Bases: torch.utils.data.dataset.Dataset

Generates a random dataset

Parameters
  • length – the number of random items to create in the dataset

  • shape – the shape of the data to create

  • normalize – Normalize the data according to imagenet distribution (shape must match 3,x,x)

sparseml.pytorch.datasets.registry module

Code related to the PyTorch dataset registry for easily creating datasets.

class sparseml.pytorch.datasets.registry.DatasetRegistry[source]

Bases: object

Registry class for creating datasets

static attributes(key: str)Dict[str, Any][source]
Parameters

key – the dataset key (name) to create

Returns

the specified attributes for the dataset

static create(key: str, *args, **kwargs)torch.utils.data.dataset.Dataset[source]

Create a new dataset for the given key

Parameters

key – the dataset key (name) to create

Returns

the instantiated model

static register(key: Union[str, List[str]], attributes: Dict[str, Any])[source]

Register a dataset with the registry. Should be used as a decorator

Parameters
  • key – the model key (name) to create

  • attributes – the specified attributes for the dataset

Returns

the decorator

Module contents

Code for creating and loading datasets in PyTorch