sparsezoo.utils package

Submodules

sparsezoo.utils.data module

Utilities for data loading into numpy for use in ONNX supported systems

class sparsezoo.utils.data.DataLoader(*datasets: sparsezoo.utils.data.Dataset, batch_size: int, iter_steps: int = 0, batch_as_list: bool = False)[source]

Bases: Iterable

Data loader instance that supports loading numpy arrays from file or memory and creating an iterator to go through batches of that data. Iterator returns a list containing all data originally loaded.

Parameters
  • datasets – any number of datasets to load for the dataloader

  • batch_size – the size of batches to create for the iterator

  • iter_steps – the number of steps (batches) to create. Set to -1 for infinite, 0 for running through the loaded data once, or a positive integer for the desired number of steps

  • batch_as_list – True to create the items from each dataset as a list, False for an ordereddict

property batch_as_list

True to create the items from each dataset as a list, False for an ordereddict

Type

return

property batch_size

the size of batches to create for the iterator

Type

return

property datasets

any number of datasets to load for the dataloader

Type

return

get_batch(bath_index: int)Union[Dict[str, Union[List[numpy.ndarray], Dict[str, numpy.ndarray]]], List[numpy.ndarray], Dict[str, numpy.ndarray]][source]

Get a batch from the data at the given index

Parameters

bath_index – the index of the batch to get

Returns

the created batch

property infinite

True if the loader instance is setup to continually create batches, False otherwise

Type

return

property iter_steps

the number of steps (batches) to create. Set to -1 for infinite, 0 for running through the loaded data once, or a positive integer for the desired number of steps

Type

return

property num_items

the number of items in each dataset

Type

return

class sparsezoo.utils.data.Dataset(name: str, data: Union[str, Iterable[Union[str, numpy.ndarray, Dict[str, numpy.ndarray]]]])[source]

Bases: Iterable

A numpy dataset implementation

Parameters
  • name – The name for the dataset

  • data – The data for the dataset. Can be one of [str - path to a folder containing numpy files, Iterable[str] - list of paths to numpy files, Iterable[ndarray], Iterable[Dict[str, ndarray]] ]

property data

The list of data items for the dataset.

Type

return

property name

The name for the dataset

Type

return

class sparsezoo.utils.data.RandomDataset(name: str, typed_shapes: Dict[str, Tuple[Iterable[int], Optional[numpy.dtype]]], num_samples: int = 20)[source]

Bases: sparsezoo.utils.data.Dataset

A numpy dataset created from random data

Parameters
  • name – The name for the dataset

  • typed_shapes – A dictionary containing the info for the random data to create, the names of the items in the data map to a tuple (shapes, numpy type). If numpy type is none, it will default to float32. Ex: {“inp”: ([3, 224, 224], None)}

  • num_samples – The number of random samples to create

sparsezoo.utils.downloader module

Code related to efficiently downloading multiple files with parallel workers

class sparsezoo.utils.downloader.DownloadProgress(chunk_size, downloaded, content_length, path)

Bases: tuple

property chunk_size

Alias for field number 0

property content_length

Alias for field number 2

property downloaded

Alias for field number 1

property path

Alias for field number 3

exception sparsezoo.utils.downloader.PreviouslyDownloadedError(*args: object)[source]

Bases: Exception

Error raised when a file has already been downloaded and overwrite is False

sparsezoo.utils.downloader.download_file(url_path: str, dest_path: str, overwrite: bool, num_retries: int = 3, show_progress: bool = True, progress_title: Optional[str] = None)[source]

Download a file from the given url to the desired local path

Parameters
  • url_path – the source url to download the file from

  • dest_path – the local file path to save the downloaded file to

  • overwrite – True to overwrite any previous files if they exist, False to not overwrite and raise an error if a file exists

  • num_retries – number of times to retry the download if it fails

  • show_progress – True to show a progress bar for the download, False otherwise

  • progress_title – The title to show with the progress bar

Raises

PreviouslyDownloadedError – raised if file already exists at dest_path nad overwrite is False

sparsezoo.utils.downloader.download_file_iter(url_path: str, dest_path: str, overwrite: bool, num_retries: int = 3)Iterator[sparsezoo.utils.downloader.DownloadProgress][source]

Download a file from the given url to the desired local path

Parameters
  • url_path – the source url to download the file from

  • dest_path – the local file path to save the downloaded file to

  • overwrite – True to overwrite any previous files if they exist, False to not overwrite and raise an error if a file exists

  • num_retries – number of times to retry the download if it fails

Returns

an iterator representing the progress for the file download

Raises

PreviouslyDownloadedError – raised if file already exists at dest_path nad overwrite is False

sparsezoo.utils.helpers module

Code related to helper functions for model zoo

sparsezoo.utils.helpers.clean_path(path: str)str[source]
Parameters

path – the directory or file path to clean

Returns

a cleaned version that expands the user path and creates an absolute path

sparsezoo.utils.helpers.convert_to_bool(val: Any)[source]
Parameters

val – a value

Returns

False if value is a Falsy value e.g. 0, f, false, None, otherwise True.

sparsezoo.utils.helpers.create_dirs(path: str)[source]
Parameters

path – the directory path to try and create

sparsezoo.utils.helpers.create_parent_dirs(path: str)[source]
Parameters

path – the file path to try to create the parent directories for

sparsezoo.utils.helpers.create_tqdm_auto_constructor()Union[tqdm.std.tqdm, tqdm.tqdm_notebook][source]
Returns

the tqdm instance to use for progress. If ipywidgets is installed then will return auto.tqdm, if not will return tqdm so that notebooks will not break

sparsezoo.utils.helpers.tqdm_auto

alias of tqdm.std.tqdm

sparsezoo.utils.numpy module

Code related to numpy array files

class sparsezoo.utils.numpy.NumpyArrayBatcher[source]

Bases: object

Batcher instance to handle taking in dictionaries of numpy arrays, appending multiple items to them to increase their batch size, and then stack them into a single batched numpy array for all keys in the dicts.

append(item: Union[numpy.ndarray, Dict[str, numpy.ndarray]])[source]

Append a new item into the current batch. All keys and shapes must match the current state.

Parameters

item – the item to add for batching

stack(as_list: bool = False)Union[List[numpy.ndarray], Dict[str, numpy.ndarray]][source]

Stack the current items into a batch along a new, zeroed dimension

Parameters

as_list – True to return the items as a list, False to return items in a named ordereddict

Returns

the stacked items

sparsezoo.utils.numpy.load_numpy(file_path: str)Union[numpy.ndarray, Dict[str, numpy.ndarray]][source]

Load a numpy file into either an ndarray or an OrderedDict representing what was in the npz file :param file_path: the file_path to load :return: the loaded values from the file

sparsezoo.utils.numpy.load_numpy_from_tar(path: str)List[Union[numpy.ndarray, Dict[str, numpy.ndarray]]][source]

Load numpy data into a list from a tar file. All files contained in the tar are expected to be the numpy files.

Parameters

path – path to the tarfile to load the numpy data from

Returns

the list of loaded numpy data, either arrays or ordereddicts of arrays

sparsezoo.utils.numpy.load_numpy_list(data: Union[str, Iterable[Union[str, numpy.ndarray, Dict[str, numpy.ndarray]]]])List[Union[numpy.ndarray, Dict[str, numpy.ndarray]]][source]

Load numpy data into a list

Parameters

data – the data to load, one of: [folder path, iterable of file paths, iterable of numpy arrays]

Returns

the list of loaded data items

sparsezoo.utils.numpy.save_numpy(array: Union[numpy.ndarray, Dict[str, numpy.ndarray], Iterable[numpy.ndarray]], export_dir: str, name: str, npz: bool = True)[source]

Save a numpy array or collection of numpy arrays to disk

Parameters
  • array – the array or collection of arrays to save

  • export_dir – the directory to export the numpy file into

  • name – the name of the file to export to (without extension)

  • npz – True to save as an npz compressed file, False for standard npy. Note, npy can only be used for single numpy arrays

Returns

the saved path

sparsezoo.utils.numpy.tensor_export(tensor: Union[numpy.ndarray, Dict[str, numpy.ndarray], Iterable[numpy.ndarray]], export_dir: str, name: str, npz: bool = True)str[source]
Parameters
  • tensor – tensor to export to a saved numpy array file

  • export_dir – the directory to export the file in

  • name – the name of the file, .npy will be appended to it

  • npz – True to export as an npz file, False otherwise

Returns

the path of the numpy file the tensor was exported to

sparsezoo.utils.numpy.tensors_export(tensors: Union[numpy.ndarray, Dict[str, numpy.ndarray], Iterable[numpy.ndarray]], export_dir: str, name_prefix: str, counter: int = 0, break_batch: bool = False)List[str][source]
Parameters
  • tensors – the tensors to export to a saved numpy array file

  • export_dir – the directory to export the files in

  • name_prefix – the prefix name for the tensors to save as, will append info about the position of the tensor in a list or dict in addition to the .npy file format

  • counter – the current counter to save the tensor at

  • break_batch – treat the tensor as a batch and break apart into multiple tensors

Returns

the exported paths

Module contents

Utils for working with the sparsezoo