sparseml.utils package
Subpackages
Submodules
sparseml.utils.frameworks module
ML framework tokens
sparseml.utils.helpers module
General utility helper functions. Common functions for interfacing with python primitives and directories/files.
-
class
sparseml.utils.helpers.
NumpyArrayBatcher
[source] Bases:
object
Batcher instance to handle taking in dictionaries of numpy arrays, appending multiple items to them to increase their batch size, and then stack them into a single batched numpy array for all keys in the dicts.
-
sparseml.utils.helpers.
bucket_iterable
(val: Iterable[Any], num_buckets: int = 3, edge_percent: float = 0.05, sort_highest: bool = True, sort_key: Optional[Callable[Any, Any]] = None) → List[Tuple[int, Any]][source] Bucket iterable into subarray consisting of the first top percentage followed by the rest of the iterable sliced into equal sliced groups.
- Parameters
val – The iterable to bucket
num_buckets – The number of buckets to group the iterable into, does not include the top bucket
edge_percent – Group the first percent into its own bucket. If sort_highest, then this is the top percent, else bottom percent. If <= 0, then will not create an edge bucket
sort_highest – True to sort such that the highest percent is first and will create buckets in descending order. False to sort so lowest is first and create buckets in ascending order.
sort_key – The sort_key, if any, to use for sorting the iterable after converting it to a list
- Returns
a list of each value mapped to the bucket it was sorted into
-
sparseml.utils.helpers.
clean_path
(path: str) → str[source] - Parameters
path – the directory or file path to clean
- Returns
a cleaned version that expands the user path and creates an absolute path
-
sparseml.utils.helpers.
convert_to_bool
(val: Any)[source] - Parameters
val – the value to be converted to a bool, supports logical values as strings ie True, t, false, 0
- Returns
the boolean representation of the value, if it can’t be determined, falls back on returning True
-
sparseml.utils.helpers.
create_dirs
(path: str)[source] - Parameters
path – the directory path to try and create
-
sparseml.utils.helpers.
create_parent_dirs
(path: str)[source] - Parameters
path – the file path to try to create the parent directories for
-
sparseml.utils.helpers.
create_unique_dir
(path: str, check_number: int = 0) → str[source] - Parameters
path – the file path to create a unique version of (append numbers until one doesn’t exist)
check_number – the number to begin checking for unique versions at
- Returns
the unique directory path
-
sparseml.utils.helpers.
flatten_iterable
(li: Iterable)[source] - Parameters
li – a possibly nested iterable of items to be flattened
- Returns
a flattened version of the list where all elements are in a single list flattened in a depth first pattern
-
sparseml.utils.helpers.
interpolate
(x_cur: float, x0: float, x1: float, y0: Any, y1: Any, inter_func: str = 'linear') → Any[source] note, caps values at their min of x0 and max x1, designed to not work outside of that range for implementation reasons
- Parameters
x_cur – the current value for x, should be between x0 and x1
x0 – the minimum for x to interpolate between
x1 – the maximum for x to interpolate between
y0 – the minimum for y to interpolate between
y1 – the maximum for y to interpolate between
inter_func – the type of function to interpolate with: linear, cubic, inverse_cubic
- Returns
the interpolated value projecting x into y for the given interpolation function
-
sparseml.utils.helpers.
interpolate_list_linear
(measurements: List[Tuple[float, float]], x_val: Union[float, List[float]]) → List[Tuple[float, float]][source] interpolate for input values within a list of measurements linearly
- Parameters
measurements – the measurements to interpolate the output value between
x_val – the target values to interpolate to the second dimension
- Returns
a list of tuples containing the target values, interpolated values
-
sparseml.utils.helpers.
interpolated_integral
(measurements: List[Tuple[float, float]])[source] Calculate the interpolated integal for a group of measurements of the form [(x0, y0), (x1, y1), …]
- Parameters
measurements – the measurements to calculate the integral for
- Returns
the integral or area under the curve for the measurements given
-
sparseml.utils.helpers.
is_url
(val: str)[source] - Parameters
val – value to check if it is a url or not
- Returns
True if value is a URL, False otherwise
-
sparseml.utils.helpers.
json_to_jsonl
(json_file_path: str, overwrite: bool = True)[source] - Converts a json list file to jsonl file format (used for sharding efficienty)
- e.x.
[{“a”: 1}, {“a”: 1}]
- would convert to:
{“a”: 1} {“a”: 1}
- Parameters
json_file_path – file path to a json file path containing a json list of objects
overwrite – If True, the existing json file will be overwritten, if False, the file will have the same name but with a .jsonl extension
-
sparseml.utils.helpers.
load_labeled_data
(data: Union[str, Iterable[Union[str, numpy.ndarray, Dict[str, numpy.ndarray]]]], labels: Union[None, str, Iterable[Union[str, numpy.ndarray, Dict[str, numpy.ndarray]]]], raise_on_error: bool = True) → List[Tuple[Union[numpy.ndarray, Dict[str, numpy.ndarray]], Union[None, numpy.ndarray, Dict[str, numpy.ndarray]]]][source] Load labels and data from disk or from memory and group them together. Assumes sorted ordering for on disk. Will match between when a file glob is passed for either data and/or labels.
- Parameters
data – the file glob, file path to numpy data tar ball, or list of arrays to use for data
labels – the file glob, file path to numpy data tar ball, or list of arrays to use for labels, if any
raise_on_error – True to raise on any error that occurs; False to log a warning, ignore, and continue
- Returns
a list containing tuples of the data, labels. If labels was passed in as None, will now contain a None for the second index in each tuple
-
sparseml.utils.helpers.
load_numpy
(file_path: str) → Union[numpy.ndarray, Dict[str, numpy.ndarray]][source] Load a numpy file into either an ndarray or an OrderedDict representing what was in the npz file
- Parameters
file_path – the file_path to load
- Returns
the loaded values from the file
-
sparseml.utils.helpers.
parse_optimization_str
(optim_full_name: str) → Tuple[str, str, Any][source] - Parameters
optim_full_name – A name of a pretrained model optimization. i.e. ‘pruned-moderate-deepsparse’, ‘pruned-aggressive’, ‘base’
- Returns
A tuple representing the corresponding SparseZoo model sparse_name, sparse_category, and sparse_target values with appropriate defaults when not present.
-
sparseml.utils.helpers.
path_file_count
(path: str, pattern: str = '*') → int[source] Return the number of files that match the given pattern under the given path
- Parameters
path – the path to the directory to look for files under
pattern – the pattern the files must match to be counted
- Returns
the number of files matching the pattern under the directory
-
sparseml.utils.helpers.
path_file_size
(path: str) → int[source] Return the total size, in bytes, for a path on the file system
- Parameters
path – the path (directory or file) to get the size for
- Returns
the size of the path, in bytes, as stored on disk
-
sparseml.utils.helpers.
save_numpy
(array: Union[numpy.ndarray, Dict[str, numpy.ndarray], Iterable[numpy.ndarray]], export_dir: str, name: str, npz: bool = True)[source] Save a numpy array or collection of numpy arrays to disk
- Parameters
array – the array or collection of arrays to save
export_dir – the directory to export the numpy file into
name – the name of the file to export to (without extension)
npz – True to save as an npz compressed file, False for standard npy. Note, npy can only be used for single numpy arrays
- Returns
the saved path
-
sparseml.utils.helpers.
tensor_export
(tensor: Union[numpy.ndarray, Dict[str, numpy.ndarray], Iterable[numpy.ndarray]], export_dir: str, name: str, npz: bool = True) → str[source] - Parameters
tensor – tensor to export to a saved numpy array file
export_dir – the directory to export the file in
name – the name of the file, .npy will be appended to it
npz – True to export as an npz file, False otherwise
- Returns
the path of the numpy file the tensor was exported to
-
sparseml.utils.helpers.
tensors_export
(tensors: Union[numpy.ndarray, Dict[str, numpy.ndarray], Iterable[numpy.ndarray]], export_dir: str, name_prefix: str, counter: int = 0, break_batch: bool = False) → List[str][source] - Parameters
tensors – the tensors to export to a saved numpy array file
export_dir – the directory to export the files in
name_prefix – the prefix name for the tensors to save as, will append info about the position of the tensor in a list or dict in addition to the .npy file format
counter – the current counter to save the tensor at
break_batch – treat the tensor as a batch and break apart into multiple tensors
- Returns
the exported paths
-
sparseml.utils.helpers.
validate_str_iterable
(val: Union[str, Iterable[str]], error_desc: str = '') → Union[str, Iterable[str]][source] - Parameters
val – the value to validate, check that it is a list (and flattens it), otherwise checks that it’s an __ALL__ or __ALL_PRUNABLE__ string, otherwise raises a ValueError
error_desc – the description to raise an error with in the event that the val wasn’t valid
- Returns
the validated version of the param
sparseml.utils.singleton module
Code related to the Singleton design pattern
sparseml.utils.worker module
General code for parallelizing the workers
-
class
sparseml.utils.worker.
ParallelWorker
(worker_func: Callable, num_workers: int, indefinite: bool, max_source_size: int = - 1)[source] Bases:
object
Multi threading worker to parallelize tasks
- Parameters
worker_func – the function to parallelize across multiple tasks
num_workers – number of workers to use
indefinite – True to keep the thread pooling running so that more tasks can be added, False to stop after no more tasks are added
max_source_size – the maximum size for the source queue
-
add_async_generator
(gen: Iterator[Any])[source] - Parameters
gen – add an async generator to pull values from for processing
-
property
indefinite
True to keep the thread pooling running so that more tasks can be added, False to stop after no more tasks are added
- Type
return
sparseml.utils.wrapper module
Code for properly merging function attributes for decorated / wrapped functions. Merges docs, annotations, dicts, etc.
-
sparseml.utils.wrapper.
wrapper_decorator
(wrapped: Callable)[source] A wrapper decorator to be applied as a decorator to a function. Merges the decorated function properties with wrapped.
- Parameters
wrapped – the wrapped function to merge decorations with
- Returns
the decorator to apply to the function
Module contents
General utility functions used throughout sparseml