sparseml.utils package


sparseml.utils.frameworks module

ML framework tokens

sparseml.utils.helpers module

General utility helper functions. Common functions for interfacing with python primitives and directories/files.

class sparseml.utils.helpers.NumpyArrayBatcher[source]

Bases: object

Batcher instance to handle taking in dictionaries of numpy arrays, appending multiple items to them to increase their batch size, and then stack them into a single batched numpy array for all keys in the dicts.

append(item: Union[numpy.ndarray, Dict[str, numpy.ndarray]])[source]

Append a new item into the current batch. All keys and shapes must match the current state.


item – the item to add for batching

stack()Dict[str, numpy.ndarray][source]

Stack the current items into a batch along a new, zeroed dimension


the stacked items

sparseml.utils.helpers.bucket_iterable(val: Iterable[Any], num_buckets: int = 3, edge_percent: float = 0.05, sort_highest: bool = True, sort_key: Optional[Callable[Any, Any]] = None)List[Tuple[int, Any]][source]

Bucket iterable into subarray consisting of the first top percentage followed by the rest of the iterable sliced into equal sliced groups.

  • val – The iterable to bucket

  • num_buckets – The number of buckets to group the iterable into, does not include the top bucket

  • edge_percent – Group the first percent into its own bucket. If sort_highest, then this is the top percent, else bottom percent. If <= 0, then will not create an edge bucket

  • sort_highest – True to sort such that the highest percent is first and will create buckets in descending order. False to sort so lowest is first and create buckets in ascending order.

  • sort_key – The sort_key, if any, to use for sorting the iterable after converting it to a list


a list of each value mapped to the bucket it was sorted into

sparseml.utils.helpers.clean_path(path: str)str[source]

path – the directory or file path to clean


a cleaned version that expands the user path and creates an absolute path

sparseml.utils.helpers.convert_to_bool(val: Any)[source]

val – the value to be converted to a bool, supports logical values as strings ie True, t, false, 0


the boolean representation of the value, if it can’t be determined, falls back on returning True

sparseml.utils.helpers.create_dirs(path: str)[source]

path – the directory path to try and create

sparseml.utils.helpers.create_parent_dirs(path: str)[source]

path – the file path to try to create the parent directories for

sparseml.utils.helpers.create_unique_dir(path: str, check_number: int = 0)str[source]
  • path – the file path to create a unique version of (append numbers until one doesn’t exist)

  • check_number – the number to begin checking for unique versions at


the unique directory path

sparseml.utils.helpers.flatten_iterable(li: Iterable)[source]

li – a possibly nested iterable of items to be flattened


a flattened version of the list where all elements are in a single list flattened in a depth first pattern

sparseml.utils.helpers.interpolate(x_cur: float, x0: float, x1: float, y0: Any, y1: Any, inter_func: str = 'linear')Any[source]

note, caps values at their min of x0 and max x1, designed to not work outside of that range for implementation reasons

  • x_cur – the current value for x, should be between x0 and x1

  • x0 – the minimum for x to interpolate between

  • x1 – the maximum for x to interpolate between

  • y0 – the minimum for y to interpolate between

  • y1 – the maximum for y to interpolate between

  • inter_func – the type of function to interpolate with: linear, cubic, inverse_cubic


the interpolated value projecting x into y for the given interpolation function

sparseml.utils.helpers.interpolate_list_linear(measurements: List[Tuple[float, float]], x_val: Union[float, List[float]])List[Tuple[float, float]][source]

interpolate for input values within a list of measurements linearly

  • measurements – the measurements to interpolate the output value between

  • x_val – the target values to interpolate to the second dimension


a list of tuples containing the target values, interpolated values

sparseml.utils.helpers.interpolated_integral(measurements: List[Tuple[float, float]])[source]

Calculate the interpolated integal for a group of measurements of the form [(x0, y0), (x1, y1), …]


measurements – the measurements to calculate the integral for


the integral or area under the curve for the measurements given

sparseml.utils.helpers.is_url(val: str)[source]

val – value to check if it is a url or not


True if value is a URL, False otherwise

sparseml.utils.helpers.json_to_jsonl(json_file_path: str, overwrite: bool = True)[source]
Converts a json list file to jsonl file format (used for sharding efficienty)

[{“a”: 1}, {“a”: 1}]

would convert to:

{“a”: 1} {“a”: 1}

  • json_file_path – file path to a json file path containing a json list of objects

  • overwrite – If True, the existing json file will be overwritten, if False, the file will have the same name but with a .jsonl extension

sparseml.utils.helpers.load_labeled_data(data: Union[str, Iterable[Union[str, numpy.ndarray, Dict[str, numpy.ndarray]]]], labels: Union[None, str, Iterable[Union[str, numpy.ndarray, Dict[str, numpy.ndarray]]]], raise_on_error: bool = True)List[Tuple[Union[numpy.ndarray, Dict[str, numpy.ndarray]], Union[None, numpy.ndarray, Dict[str, numpy.ndarray]]]][source]

Load labels and data from disk or from memory and group them together. Assumes sorted ordering for on disk. Will match between when a file glob is passed for either data and/or labels.

  • data – the file glob, file path to numpy data tar ball, or list of arrays to use for data

  • labels – the file glob, file path to numpy data tar ball, or list of arrays to use for labels, if any

  • raise_on_error – True to raise on any error that occurs; False to log a warning, ignore, and continue


a list containing tuples of the data, labels. If labels was passed in as None, will now contain a None for the second index in each tuple

sparseml.utils.helpers.load_numpy(file_path: str)Union[numpy.ndarray, Dict[str, numpy.ndarray]][source]

Load a numpy file into either an ndarray or an OrderedDict representing what was in the npz file


file_path – the file_path to load


the loaded values from the file

sparseml.utils.helpers.parse_optimization_str(optim_full_name: str)Tuple[str, str, Any][source]

optim_full_name – A name of a pretrained model optimization. i.e. ‘pruned-moderate-deepsparse’, ‘pruned-aggressive’, ‘base’


A tuple representing the corresponding SparseZoo model sparse_name, sparse_category, and sparse_target values with appropriate defaults when not present.

sparseml.utils.helpers.path_file_count(path: str, pattern: str = '*')int[source]

Return the number of files that match the given pattern under the given path

  • path – the path to the directory to look for files under

  • pattern – the pattern the files must match to be counted


the number of files matching the pattern under the directory

sparseml.utils.helpers.path_file_size(path: str)int[source]

Return the total size, in bytes, for a path on the file system


path – the path (directory or file) to get the size for


the size of the path, in bytes, as stored on disk

sparseml.utils.helpers.save_numpy(array: Union[numpy.ndarray, Dict[str, numpy.ndarray], Iterable[numpy.ndarray]], export_dir: str, name: str, npz: bool = True)[source]

Save a numpy array or collection of numpy arrays to disk

  • array – the array or collection of arrays to save

  • export_dir – the directory to export the numpy file into

  • name – the name of the file to export to (without extension)

  • npz – True to save as an npz compressed file, False for standard npy. Note, npy can only be used for single numpy arrays


the saved path

sparseml.utils.helpers.tensor_export(tensor: Union[numpy.ndarray, Dict[str, numpy.ndarray], Iterable[numpy.ndarray]], export_dir: str, name: str, npz: bool = True)str[source]
  • tensor – tensor to export to a saved numpy array file

  • export_dir – the directory to export the file in

  • name – the name of the file, .npy will be appended to it

  • npz – True to export as an npz file, False otherwise


the path of the numpy file the tensor was exported to

sparseml.utils.helpers.tensors_export(tensors: Union[numpy.ndarray, Dict[str, numpy.ndarray], Iterable[numpy.ndarray]], export_dir: str, name_prefix: str, counter: int = 0, break_batch: bool = False)List[str][source]
  • tensors – the tensors to export to a saved numpy array file

  • export_dir – the directory to export the files in

  • name_prefix – the prefix name for the tensors to save as, will append info about the position of the tensor in a list or dict in addition to the .npy file format

  • counter – the current counter to save the tensor at

  • break_batch – treat the tensor as a batch and break apart into multiple tensors


the exported paths

sparseml.utils.helpers.validate_str_iterable(val: Union[str, Iterable[str]], error_desc: str = '')Union[str, Iterable[str]][source]
  • val – the value to validate, check that it is a list (and flattens it), otherwise checks that it’s an __ALL__ or __ALL_PRUNABLE__ string, otherwise raises a ValueError

  • error_desc – the description to raise an error with in the event that the val wasn’t valid


the validated version of the param

sparseml.utils.singleton module

Code related to the Singleton design pattern

class sparseml.utils.singleton.Singleton[source]

Bases: type

A singleton class implementation meant to be added to others as a metaclass.

Ex: class Logger(metaclass=Singleton)

sparseml.utils.worker module

General code for parallelizing the workers

class sparseml.utils.worker.ParallelWorker(worker_func: Callable, num_workers: int, indefinite: bool, max_source_size: int = - 1)[source]

Bases: object

Multi threading worker to parallelize tasks

  • worker_func – the function to parallelize across multiple tasks

  • num_workers – number of workers to use

  • indefinite – True to keep the thread pooling running so that more tasks can be added, False to stop after no more tasks are added

  • max_source_size – the maximum size for the source queue

add(vals: List[Any])[source]

vals – the values to add for processing work

add_async(vals: List[Any])[source]

vals – the values to add for async workers

add_async_generator(gen: Iterator[Any])[source]

gen – add an async generator to pull values from for processing

add_item(val: Any)[source]

val – add a single item for processing

property indefinite

True to keep the thread pooling running so that more tasks can be added, False to stop after no more tasks are added




Stop the workers


Start the workers

sparseml.utils.wrapper module

Code for properly merging function attributes for decorated / wrapped functions. Merges docs, annotations, dicts, etc.

sparseml.utils.wrapper.wrapper_decorator(wrapped: Callable)[source]

A wrapper decorator to be applied as a decorator to a function. Merges the decorated function properties with wrapped.


wrapped – the wrapped function to merge decorations with


the decorator to apply to the function

Module contents

General utility functions used throughout sparseml