sparseml.pytorch.optim package

Submodules

sparseml.pytorch.optim.analyzer_as module

Code related to analyzing activation sparsity within PyTorch neural networks. More information can be found in the paper here.

class sparseml.pytorch.optim.analyzer_as.ASResultType(value)[source]

Bases: enum.Enum

Result type to track for activation sparsity.

inputs_sample = 'inputs_sample'
inputs_sparsity = 'inputs_sparsity'
outputs_sample = 'outputs_sample'
outputs_sparsity = 'outputs_sparsity'
class sparseml.pytorch.optim.analyzer_as.ModuleASAnalyzer(module: torch.nn.modules.module.Module, dim: Union[None, int, Tuple[int, ]] = None, track_inputs_sparsity: bool = False, track_outputs_sparsity: bool = False, inputs_sample_size: int = 0, outputs_sample_size: int = 0, enabled: bool = True)[source]

Bases: object

An analyzer implementation used to monitor the activation sparsity with a module. Generally used to monitor an individual layer.

Parameters
  • module – The module to analyze activation sparsity for

  • dim – Any dims within the tensor such as across batch, channel, etc. Ex: 0 for batch, 1 for channel, [0, 1] for batch and channel

  • track_inputs_sparsity – True to track the input sparsity to the module, False otherwise

  • track_outputs_sparsity – True to track the output sparsity to the module, False otherwise

  • inputs_sample_size – The number of samples to grab from the input tensor on each forward pass. If <= 0, then will not sample any values.

  • outputs_sample_size – The number of samples to grab from the output tensor on each forward pass. If <= 0, then will not sample any values.

  • enabled – True to enable the hooks for analyzing and actively track, False to disable and not track

static analyze_layers(module: torch.nn.modules.module.Module, layers: List[str], dim: Union[None, int, Tuple[int, ]] = None, track_inputs_sparsity: bool = False, track_outputs_sparsity: bool = False, inputs_sample_size: int = 0, outputs_sample_size: int = 0, enabled: bool = True)[source]
Parameters
  • module – the module to analyze multiple layers activation sparsity in

  • layers – the names of the layers to analyze (from module.named_modules())

  • dim – Any dims within the tensor such as across batch, channel, etc. Ex: 0 for batch, 1 for channel, [0, 1] for batch and channel

  • track_inputs_sparsity – True to track the input sparsity to the module, False otherwise

  • track_outputs_sparsity – True to track the output sparsity to the module, False otherwise

  • inputs_sample_size – The number of samples to grab from the input tensor on each forward pass. If <= 0, then will not sample any values.

  • outputs_sample_size – The number of samples to grab from the output tensor on each forward pass. If <= 0, then will not sample any values.

  • enabled – True to enable the hooks for analyzing and actively track, False to disable and not track

Returns

a list of the created analyzers, matches the ordering in layers

clear(specific_result_type: Union[None, sparseml.pytorch.optim.analyzer_as.ASResultType] = None)[source]
property dim
disable()[source]
enable()[source]
property enabled
property inputs_sample
property inputs_sample_max
property inputs_sample_mean
property inputs_sample_min
property inputs_sample_size
property inputs_sample_std
property inputs_sparsity
property inputs_sparsity_max
property inputs_sparsity_mean
property inputs_sparsity_min
property inputs_sparsity_std
property module
property outputs_sample
property outputs_sample_max
property outputs_sample_mean
property outputs_sample_min
property outputs_sample_size
property outputs_sample_std
property outputs_sparsity
property outputs_sparsity_max
property outputs_sparsity_mean
property outputs_sparsity_min
property outputs_sparsity_std
results(result_type: sparseml.pytorch.optim.analyzer_as.ASResultType)List[torch.Tensor][source]
results_max(result_type: sparseml.pytorch.optim.analyzer_as.ASResultType)torch.Tensor[source]
results_mean(result_type: sparseml.pytorch.optim.analyzer_as.ASResultType)torch.Tensor[source]
results_min(result_type: sparseml.pytorch.optim.analyzer_as.ASResultType)torch.Tensor[source]
results_std(result_type: sparseml.pytorch.optim.analyzer_as.ASResultType)torch.Tensor[source]
property track_inputs_sparsity
property track_outputs_sparsity

sparseml.pytorch.optim.analyzer_module module

Code related to monitoring, analyzing, and reporting info for Modules in PyTorch. Records things like FLOPS, input and output shapes, kernel shapes, etc.

class sparseml.pytorch.optim.analyzer_module.ModuleAnalyzer(module: torch.nn.modules.module.Module, enabled: bool = False)[source]

Bases: object

An analyzer implementation for monitoring the execution profile and graph of a Module in PyTorch.

Parameters
  • module – the module to analyze

  • enabled – True to enable the hooks for analyzing and actively track, False to disable and not track

property enabled

True if enabled and the hooks for analyzing are active, False otherwise

Type

return

ks_layer_descs()List[sparseml.optim.analyzer.AnalyzedLayerDesc][source]

Get the descriptions for all layers in the module that support kernel sparsity (model pruning). Ex: all convolutions and linear layers.

Returns

a list of descriptions for all layers in the module that support ks

layer_desc(name: Optional[str] = None)sparseml.optim.analyzer.AnalyzedLayerDesc[source]

Get a specific layer’s description within the Module. Set to None to get the overall Module’s description.

Parameters

name – name of the layer to get a description for, None for an overall description

Returns

the analyzed layer description for the given name

property module

The module that is being actively analyzed

Type

return

sparseml.pytorch.optim.analyzer_pruning module

Code related to monitoring, analyzing, and reporting the kernel sparsity (model pruning) for a model’s layers and params. More info on kernel sparsity can be found here <https://arxiv.org/abs/1902.09574> __.

class sparseml.pytorch.optim.analyzer_pruning.ModulePruningAnalyzer(module: torch.nn.modules.module.Module, name: str, param_name: str = 'weight')[source]

Bases: object

An analyzer implementation monitoring the kernel sparsity of a given param in a module.

Parameters
  • module – the module containing the param to analyze the sparsity for

  • name – name of the layer, used for tracking

  • param_name – name of the parameter to analyze the sparsity for, defaults to weight

static analyze_layers(module: torch.nn.modules.module.Module, layers: List[str], param_name: str = 'weight')[source]
Parameters
  • module – the module to create multiple analyzers for

  • layers – the names of the layers to create analyzer for that are in the module

  • param_name – the name of the param to monitor within each layer

Returns

a list of analyzers, one for each layer passed in and in the same order

property module

the module containing the param to analyze the sparsity for

Type

return

property name

name of the layer, used for tracking

Type

return

property param

the parameter that is being monitored for kernel sparsity

Type

return

property param_name

name of the parameter to analyze the sparsity for, defaults to weight

Type

return

property param_sparsity

the sparsity of the contained parameter (how many zeros are in it)

Type

return

param_sparsity_dim(dim: Union[None, int, Tuple[int, ]] = None)torch.Tensor[source]
Parameters

dim – a dimension(s) to calculate the sparsity over, ex over channels

Returns

the sparsity of the contained parameter structured according to the dim passed in

property tag

combines the layer name and param name in to a single string separated by a period

Type

return

sparseml.pytorch.optim.manager module

Contains base code related to modifier managers: modifier managers handle grouping modifiers and running them together. Also handles loading modifiers from yaml files

class sparseml.pytorch.optim.manager.RecipeManagerStepWrapper(wrap: Any, optimizer: torch.optim.optimizer.Optimizer, module: torch.nn.modules.module.Module, manager: Any, epoch: float, steps_per_epoch: int)[source]

Bases: object

A wrapper class to handle wrapping an optimizer or optimizer like object and override the step function. The override calls into the ScheduledModifierManager when appropriate and enabled and then calls step() as usual on the function with the original arguments. All original attributes and methods are forwarded to the wrapped object so this class can be a direct substitute for it.

Parameters
  • wrap – The object to wrap the step function and properties for.

  • optimizer – The optimizer used in the training process.

  • module – The model/module used in the training process.

  • manager – The manager to forward lifecycle calls into such as step.

  • epoch – The epoch to start the modifying process at.

  • steps_per_epoch – The number of optimizer steps (batches) in each epoch.

emulated_step()[source]

Emulated step function to be called in place of step when the number of steps_per_epoch vary across epochs. The emulated function should be called to keep the steps_per_epoch thee same. Does not call into the step function for the wrapped object, but does call into the manager to increment the steps.

loss_update(loss: torch.Tensor)torch.Tensor[source]

Optional call to update modifiers based on the calculated loss. Not needed unless one or more of the modifier is using the loss to make a modification or is modifying the loss itself.

Parameters

loss – the calculated loss after running a forward pass and loss_fn

Returns

the modified loss tensor

step(*args, **kwargs)[source]

Override for the step function. Calls into the base step function with the args and kwargs.

Parameters
  • args – Any args to pass to the wrapped objects step function.

  • kwargs – Any kwargs to pass to the wrapped objects step function.

Returns

The return, if any, from the wrapped objects step function

property wrapped

The object to wrap the step function and properties for.

Type

return

property wrapped_epoch

The current epoch the wrapped object is at.

Type

return

property wrapped_manager

The manager to forward lifecycle calls into such as step.

Type

return

property wrapped_module

The model/module used in the training process.

Type

return

property wrapped_optimizer

The optimizer used in the training process.

Type

return

property wrapped_steps

The current number of steps that have been called for the wrapped object.

Type

return

property wrapped_steps_per_epoch

The number of optimizer steps (batches) in each epoch.

Type

return

class sparseml.pytorch.optim.manager.ScheduledModifierManager(modifiers: List[sparseml.pytorch.sparsification.modifier.ScheduledModifier], metadata: Optional[Dict[str, Any]] = None)[source]

Bases: sparseml.optim.manager.BaseManager, sparseml.pytorch.sparsification.modifier.Modifier

The base modifier manager, handles managing multiple ScheduledModifers.

Lifecycle:
- initialize
- initialize_loggers
- modify
- finalize
Parameters

modifiers – the modifiers to wrap

apply_structure(module: torch.nn.modules.module.Module, epoch: float = 0.0, loggers: Union[None, sparseml.pytorch.utils.logger.LoggerManager, List[sparseml.pytorch.utils.logger.BaseLogger]] = None, finalize: bool = False, **kwargs)[source]

Initialize/apply the modifier for a given model/module at the given epoch if the modifier affects the structure of the module such as quantization, layer pruning, or filter pruning. Calls into initialize(module, epoch, loggers, **kwargs) if structured.

Parameters
  • module – the PyTorch model/module to modify

  • epoch – the epoch to apply the modifier at, defaults to 0.0 (start)

  • loggers – Optional logger manager to log the modification process to

  • finalize – True to invoke finalize after initialize, False otherwise. Set finalize to True and epoch to math.inf for one shot application.

  • kwargs – Optional kwargs to support specific arguments for individual modifiers (passed to initialize and finalize).

finalize(module: Optional[torch.nn.modules.module.Module] = None, reset_loggers: bool = True, **kwargs)[source]

Handles any finalization of the modifier for the given model/module. Applies any remaining logic and cleans up any hooks or attachments to the model.

Parameters
  • module – The model/module to finalize the modifier for. Marked optional so state can still be cleaned up on delete, but generally should always be passed in.

  • reset_loggers – True to remove any currently attached loggers (default), False to keep the loggers attached.

  • kwargs – Optional kwargs to support specific arguments for individual modifiers.

static from_yaml(file_path: Union[str, sparsezoo.objects.recipe.Recipe], add_modifiers: Optional[List[sparseml.pytorch.sparsification.modifier.Modifier]] = None, recipe_variables: Optional[Union[Dict[str, Any], str]] = None, metadata: Optional[Dict[str, Any]] = None)[source]

Convenience function used to create the manager of multiple modifiers from a recipe file.

Parameters
  • file_path

    the path to the recipe file to load the modifier from, or a SparseZoo model stub to load a recipe for a model stored in SparseZoo. SparseZoo stubs should be preceded by ‘zoo:’, and can contain an optional ‘?recipe_type=<type>’ parameter. Can also be a SparseZoo Recipe object. i.e. ‘/path/to/local/recipe.yaml’, ‘zoo:model/stub/path’, ‘zoo:model/stub/path?recipe_type=transfer’. Additionally, a raw

    yaml str is also supported in place of a file path.

  • add_modifiers – additional modifiers that should be added to the returned manager alongside the ones loaded from the recipe file

  • recipe_variables – additional arguments to override any root variables in the recipe with (i.e. num_epochs, init_lr)

Metadata

additional (to the information provided in the recipe) data to be preserved and utilized in the future - for reproducibility and completeness.

Returns

ScheduledModifierManager() created from the recipe file

initialize(module: torch.nn.modules.module.Module, epoch: float = 0, loggers: Union[None, sparseml.pytorch.utils.logger.LoggerManager, List[sparseml.pytorch.utils.logger.BaseLogger]] = None, **kwargs)[source]

Handles any initialization of the manager for the given model/module. epoch and steps_per_epoch can optionally be passed in to initialize the manager and module at a specific point in the training process. If loggers is not None, will additionally call initialize_loggers.

Parameters
  • module – the PyTorch model/module to modify

  • epoch – The epoch to initialize the manager and module at. Defaults to 0 (start of the training process)

  • loggers – Optional logger manager to log the modification process to

  • kwargs – Optional kwargs to support specific arguments for individual modifiers.

initialize_loggers(loggers: Union[None, sparseml.pytorch.utils.logger.LoggerManager, List[sparseml.pytorch.utils.logger.BaseLogger]])[source]

Handles initializing and setting up the loggers for the contained modifiers.

Parameters

loggers – the logger manager to setup this manager with for logging important info and milestones to

load_state_dict(state_dict: Dict[str, Dict], strict: bool = True)[source]

Loads the given state dict into this manager. All modifiers that match will be loaded. If any are missing or extra and strict=True, then will raise a KeyError

Parameters
  • state_dict – dictionary object as generated by this object’s state_dict function

  • strict – True to raise a KeyError for any missing or extra information in the state dict, False to ignore

Raises

IndexError – If any keys in the state dict do not correspond to a valid index for this manager and strict=True

loss_update(loss: torch.Tensor, module: torch.nn.modules.module.Module, optimizer: torch.optim.optimizer.Optimizer, epoch: float, steps_per_epoch: int, **kwargs)torch.Tensor[source]

Optional call that can be made on the optimizer to update the contained modifiers once loss has been calculated

Parameters
  • loss – The calculated loss tensor

  • module – module to modify

  • optimizer – optimizer to modify

  • epoch – current epoch and progress within the current epoch

  • steps_per_epoch – number of steps taken within each epoch (calculate batch number using this and epoch)

Returns

the modified loss tensor

modify(module: torch.nn.modules.module.Module, optimizer: torch.optim.optimizer.Optimizer, steps_per_epoch: int, wrap_optim: Optional[Any] = None, epoch: Optional[float] = None, allow_parallel_module: bool = True, **kwargs)sparseml.pytorch.optim.manager.RecipeManagerStepWrapper[source]

Modify the given module and optimizer for training aware algorithms such as pruning and quantization. Initialize must be called first. After training is complete, finalize should be called.

Parameters
  • module – The model/module to modify

  • optimizer – The optimizer to modify

  • steps_per_epoch – The number of optimizer steps (batches) in each epoch

  • wrap_optim – Optional object to wrap instead of the optimizer. Useful for cases like amp (fp16 training) where a it should be wrapped in place of the original optimizer since it doesn’t always call into the optimizer.step() function.

  • epoch – Optional epoch that can be passed in to start modifying at. Defaults to the epoch that was supplied to the initialize function.

  • allow_parallel_module – if False, a DataParallel or DistributedDataParallel module passed to this function will be unwrapped to its base module during recipe initialization by referencing module.module. This is useful so a recipe may reference the base module parameters instead of the wrapped distributed ones. Set to True to not unwrap the distributed module. Default is True

  • kwargs – Key word arguments that are passed to the intialize call if initilaize has not been called yet

Returns

A wrapped optimizer object. The wrapped object makes all the original properties for the wrapped object available so it can be used without any additional code changes.

optimizer_post_step(module: torch.nn.modules.module.Module, optimizer: torch.optim.optimizer.Optimizer, epoch: float, steps_per_epoch: int)[source]

Called after the optimizer step happens and weights have updated Calls into the contained modifiers

Parameters
  • module – module to modify

  • optimizer – optimizer to modify

  • epoch – current epoch and progress within the current epoch

  • steps_per_epoch – number of steps taken within each epoch (calculate batch number using this and epoch)

optimizer_pre_step(module: torch.nn.modules.module.Module, optimizer: torch.optim.optimizer.Optimizer, epoch: float, steps_per_epoch: int)[source]

Called before the optimizer step happens (after backward has been called, before optimizer.step) Calls into the contained modifiers

Parameters
  • module – module to modify

  • optimizer – optimizer to modify

  • epoch – current epoch and progress within the current epoch

  • steps_per_epoch – number of steps taken within each epoch (calculate batch number using this and epoch)

state_dict()Dict[str, Dict][source]
Returns

Dictionary to store any state variables for this manager. Includes all modifiers nested under this manager as sub keys in the dict. Only modifiers that a non empty state dict are included.

update(module: torch.nn.modules.module.Module, optimizer: torch.optim.optimizer.Optimizer, epoch: float, steps_per_epoch: int, log_updates: bool = True)[source]

Handles updating the contained modifiers’ states, module, or optimizer Only calls scheduled_update on the each modifier if modifier.update_ready()

Parameters
  • module – module to modify

  • optimizer – optimizer to modify

  • epoch – current epoch and progress within the current epoch

  • steps_per_epoch – number of steps taken within each epoch (calculate batch number using this and epoch)

  • log_updates – True to log the updates for each modifier to the loggers, False to skip logging

sparseml.pytorch.optim.mask_creator_pruning module

Classes for defining sparsity masks based on model parameters.

NOTE: this file is in the process of being phased out in favor of the sparsification package. Once all references to mask utils in the optim package are migrated, this file will be deleted

class sparseml.pytorch.optim.mask_creator_pruning.BlockPruningMaskCreator(block_shape: List[int], grouping_fn_name: str = 'mean')[source]

Bases: sparseml.pytorch.optim.mask_creator_pruning.GroupedPruningMaskCreator

Structured sparsity mask creator that groups the input tensor into blocks of shape block_shape.

Parameters
  • block_shape – The shape in and out channel should take in blocks. Should be a list of exactly two integers that divide the input tensors evenly on the channel dimensions. -1 for a dimension blocks across the entire dimension

  • grouping_fn_name – The name of the torch grouping function to reduce dimensions by

group_tensor(tensor: torch.Tensor)torch.Tensor[source]
Parameters

tensor – The tensor to transform

Returns

The mean values of the tensor grouped by blocks of shape self._block_shape

class sparseml.pytorch.optim.mask_creator_pruning.DimensionSparsityMaskCreator(dim: Union[str, int, List[int]], grouping_fn_name: str = 'l2', tensor_group_idxs: Optional[List[List[int]]] = None)[source]

Bases: sparseml.pytorch.optim.mask_creator_pruning.GroupedPruningMaskCreator

Structured sparsity mask creator that groups sparsity blocks by the given dimension(s)

Parameters
  • dim – The index or list of indices of dimensions to group the mask by or the type of dims to prune ([‘channel’, ‘filter’])

  • grouping_fn_name – The name of the torch grouping function to reduce dimensions by. Default is ‘l2’

  • tensor_group_idxs – list of lists of input tensor idxs whose given dimensions should be scored together. If set, all idxs in the range of provided tensors must be included in exactly one group (tensors in their own group should be a list of length 1). If None, no tensor groups will be used

create_sparsity_masks(tensors: List[torch.Tensor], sparsity: Union[float, List[float]], global_sparsity: bool = False)List[torch.Tensor][source]
Parameters
  • tensors – list of tensors to calculate masks from based on their contained values

  • sparsity – the desired sparsity to reach within the mask (decimal fraction of zeros) can also be a list where each element is a sparsity for a tensor in the same position in the tensor list, If global sparsity is enabled, all values of the sparsity list must be the same

  • global_sparsity – do not set True, unsupported for DimensionSparsityMaskCreator

Returns

list of masks (0.0 for values that are masked, 1.0 for values that are unmasked) calculated from the tensors such that the desired number of zeros matches the sparsity and all values mapped to the same group have the same value

group_tensor(tensor: torch.Tensor)torch.Tensor[source]
Parameters

tensor – The tensor to transform

Returns

The mean values of the tensor grouped by the dimension(s) in self._dim

set_tensor_group_idxs(tensor_group_idxs: Optional[List[List[int]]])[source]
Parameters

tensor_group_idxs – list of lists of input tensor idxs whose given dimensions should be scored together. If set, all idxs in the range of provided tensors must be included in exactly one group (tensors in their own group should be a list of length 1). If None, no tensor groups will be used

property structure_type

the type of structure pruned masks this mask creator produces must be either ‘channel’ or ‘filter’

Type

return

class sparseml.pytorch.optim.mask_creator_pruning.FourBlockMaskCreator(grouping_fn_name: str = 'mean')[source]

Bases: sparseml.pytorch.optim.mask_creator_pruning.GroupedPruningMaskCreator

semi-structured sparsity mask creator that groups sparsity blocks in groups of four along the input-channel dimension (assumed to be dimension 1 for pytorch)

Equivalent to BlockPruningMaskCreator([1, 4]) without restrictions on number of dimensions, or divisibility

Parameters

grouping_fn_name – The name of the torch grouping function to reduce dimensions by

group_tensor(tensor: torch.Tensor)torch.Tensor[source]
Parameters

tensor – The tensor to transform

Returns

The mean values of the tensor grouped by blocks of shape self._block_shape

class sparseml.pytorch.optim.mask_creator_pruning.GroupedPruningMaskCreator[source]

Bases: sparseml.pytorch.optim.mask_creator_pruning.UnstructuredPruningMaskCreator

Abstract class for a sparsity mask creator that structures masks according to grouping functions. Subclasses should implement group_tensor and _map_mask_to_tensor

create_sparsity_masks(tensors: List[torch.Tensor], sparsity: Union[float, List[float]], global_sparsity: bool = False)List[torch.Tensor][source]
Parameters
  • tensors – list of tensors to calculate masks from based on their contained values

  • sparsity – the desired sparsity to reach within the mask (decimal fraction of zeros) can also be a list where each element is a sparsity for a tensor in the same position in the tensor list. If global sparsity is enabled, all values of the sparsity list must be the same

  • global_sparsity – if True, sparsity masks will be created such that the average sparsity across all given tensors is the target sparsity with the lowest global values masked. If False, each tensor will be masked to the target sparsity ranking values within each individual tensor. Default is False

Returns

list of masks (0.0 for values that are masked, 1.0 for values that are unmasked) calculated from the tensors such that the desired number of zeros matches the sparsity and all values mapped to the same group have the same value

create_sparsity_masks_from_tensor(tensors: List[torch.Tensor])List[torch.Tensor][source]
Parameters

tensors – list of tensors to calculate masks based on their values

Returns

list of masks derived from the values of the tensors grouped by the group_tensor function.

create_sparsity_masks_from_threshold(tensors: List[torch.Tensor], threshold: Union[float, torch.Tensor])List[torch.Tensor][source]
Parameters
  • tensors – list of tensors to calculate masks from based on their contained values

  • threshold – a threshold of group_tensor values to determine cutoff for sparsification

Returns

list of masks derived from the tensors and the grouped threshold

abstract group_tensor(tensor: torch.Tensor)torch.Tensor[source]
Parameters

tensor – The tensor to reduce in groups

Returns

The grouped tensor

static reduce_tensor(tensor: torch.Tensor, dim: Union[int, List[int]], reduce_fn_name: str, keepdim: bool = True)torch.Tensor[source]
Parameters
  • tensor – the tensor to reduce

  • dim – dimension or list of dimension to reduce along

  • reduce_fn_name – function name to reduce tensor with. valid options are ‘l2’, ‘mean’, ‘max’, ‘min’

  • keepdim – preserves the reduced dimension(s) in returned tensor shape as shape 1. default is True

Returns

Tensor reduced along the given dimension(s)

class sparseml.pytorch.optim.mask_creator_pruning.PruningMaskCreator[source]

Bases: abc.ABC

Base abstract class for a sparsity mask creator. Subclasses should define all methods for creating masks

abstract create_sparsity_masks(tensors: List[torch.Tensor], sparsity: Union[float, List[float]], global_sparsity: bool = False)List[torch.Tensor][source]
Parameters
  • tensors – list of tensors to calculate a masks based on their contained values

  • sparsity – the desired sparsity to reach within the mask (decimal fraction of zeros) can also be a list where each element is a sparsity for a tensor in the same position in the tensor list. If global sparsity is enabled, all values of the sparsity list must be the same

  • global_sparsity – if True, sparsity masks will be created such that the average sparsity across all given tensors is the target sparsity with the lowest global values masked. If False, each tensor will be masked to the target sparsity ranking values within each individual tensor. Default is False

Returns

list of masks (0.0 for values that are masked, 1.0 for values that are unmasked) calculated from the tensors such that the desired number of zeros matches the sparsity.

create_sparsity_masks_from_tensor(tensors: List[torch.Tensor])List[torch.Tensor][source]
Parameters

tensors – list of tensors to calculate a masks based on their values

Returns

list of masks derived from each of the given tensors

abstract create_sparsity_masks_from_threshold(tensors: List[torch.Tensor], threshold: Union[float, torch.Tensor])List[torch.Tensor][source]
Parameters
  • tensors – list of tensors to calculate a masks based on their contained values

  • threshold – a threshold to determine cutoff for sparsification

Returns

list of masks derived from each of the given tensors and the threshold

class sparseml.pytorch.optim.mask_creator_pruning.UnstructuredPruningMaskCreator[source]

Bases: sparseml.pytorch.optim.mask_creator_pruning.PruningMaskCreator

Class for creating unstructured sparsity masks. Masks will be created using unstructured sparsity by pruning weights ranked by their value. Each mask will correspond to the given tensor.

create_sparsity_masks(tensors: List[torch.Tensor], sparsity: Union[float, List[float]], global_sparsity: bool = False)List[torch.Tensor][source]
Parameters
  • tensors – list of tensors to calculate a mask from based on their contained values

  • sparsity – the desired sparsity to reach within the mask (decimal fraction of zeros) can also be a list where each element is a sparsity for a tensor in the same position in the tensor list. If global sparsity is enabled, all values of the sparsity list must be the same

  • global_sparsity – if True, sparsity masks will be created such that the average sparsity across all given tensors is the target sparsity with the lowest global values masked. If False, each tensor will be masked to the target sparsity ranking values within each individual tensor. Default is False

Returns

list of masks (0.0 for values that are masked, 1.0 for values that are unmasked) calculated from the tensors such that the desired number of zeros matches the sparsity. If there are more zeros than the desired sparsity, zeros will be randomly chosen to match the target sparsity

create_sparsity_masks_from_threshold(tensors: List[torch.Tensor], threshold: Union[float, torch.Tensor])List[torch.Tensor][source]
Parameters
  • tensors – list of tensors to calculate a masks based on their contained values

  • threshold – a threshold at which to mask values if they are less than it or equal

Returns

list of masks (0.0 for values that are masked, 1.0 for values that are unmasked) calculated from the tensors values <= threshold are masked, all others are unmasked

sparseml.pytorch.optim.mask_creator_pruning.load_mask_creator(obj: Union[str, Iterable[int]])sparseml.pytorch.optim.mask_creator_pruning.PruningMaskCreator[source]
Parameters

obj – Formatted string or block shape iterable specifying SparsityMaskCreator object to return

Returns

SparsityMaskCreator object created from obj

sparseml.pytorch.optim.mask_pruning module

Code related to applying a mask onto a parameter to impose kernel sparsity, aka model pruning

NOTE: this file is in the process of being phased out in favor of the sparsification package. Once all references to mask utils in the optim package are migrated, this file will be deleted

class sparseml.pytorch.optim.mask_pruning.ModuleParamPruningMask(layers: List[torch.nn.modules.module.Module], param_names: Union[str, List[str]] = 'weight', store_init: bool = False, store_unmasked: bool = False, track_grad_mom: float = - 1.0, mask_creator: sparseml.pytorch.optim.mask_creator_pruning.PruningMaskCreator = unstructured, layer_names: Optional[List[str]] = None, global_sparsity: bool = False, score_type: str = 'magnitude')[source]

Bases: object

Mask to apply kernel sparsity (model pruning) to a specific parameter in a layer

Parameters
  • layers – the layers containing the parameters to mask

  • param_names – the names of the parameter to mask in each layer. If only one name is given, that name will be applied to all layers that this object masks. default is weight

  • store_init – store the init weights in a separate variable that can be used and referenced later

  • store_unmasked – store the unmasked weights in a separate variable that can be used and referenced later

  • track_grad_mom – store the gradient updates to the parameter with a momentum variable must be in the range [0.0, 1.0), if set to 0.0 then will only keep most recent

  • mask_creator – object to define sparisty mask creation, default is unstructured mask

  • layer_names – the name of the layers the parameters to mask are located in

  • global_sparsity – set True to enable global pruning. if True, when creating sparsity masks for a target sparsity sparsity masks will be created such that the average sparsity across all given layers is the target sparsity with the lowest global values masked. If False, each layer will be masked to the target sparsity ranking values within each individual tensor. Default is False

  • score_type – the method used to score parameters for masking, i.e. ‘magnitude’, ‘movement’. Default is ‘magnitude’

property allow_reintroduction

True if weight reintroduction is allowed

Type

return

apply(param_idx: Optional[int] = None)[source]

apply the current mask to the params tensor (zero out the desired values)

Parameters

param_idx – index of parameter to apply mask to. if not set, then masks will be applied to all parameters with available masks

disable_reintroduction()[source]

if weight reintroduction is enabled (only during movement pruning), disables further weight reintroduction

property enabled

True if the parameter is currently being masked, False otherwise

Type

return

property global_sparsity

True if global pruning is enabled, False otherwise

Type

return

property layer_names

the names of the layers the parameter to mask is located in

Type

return

property layers

the layers containing the parameters to mask

Type

return

property mask_creator

SparsityMaskCreator object used to generate masks

Type

return

property names

the full names of the sparsity masks in the following format: <LAYER>.<PARAM>.sparsity_mask

Type

return

property param_masks

the current masks applied to each of the parameters

Type

return

property param_names

the names of the parameters to mask in the layers

Type

return

property params_data

the current tensors in each of the parameters

Type

return

property params_grad

the current gradient values for each parameter

Type

return

property params_init

the initial values of the parameters before being masked

Type

return

property params_unmasked

the unmasked values of the parameters (stores the last unmasked value before masking)

Type

return

pre_optim_step_update()[source]

updates scores and buffers that depend on gradients. Should be called before Optimizer.step() to grab the latest gradients

pruning_end(leave_enabled: bool)[source]

Performs any cleanup necessary for this pruning method. Disables weight reintroduction if enabled and applies masks

Parameters

leave_enabled – if False, all pruning hooks will be destroyed. Default is True

reset()[source]

resets the current stored tensors such that they will be on the same device and have the initial data

property score_type

the scoring method used to create masks (i.e. magnitude, movement)

Type

return

set_param_data(value: torch.Tensor, param_idx: int)[source]
Parameters
  • value – the value to set as the current tensor for the parameter, if enabled the mask will be applied

  • param_idx – index of the parameter in this object to set the data of

set_param_masks(masks: List[torch.Tensor])[source]
Parameters

masks – the masks to set and apply as the current param tensors, if enabled mask is applied immediately

set_param_masks_from_abs_threshold(threshold: Union[float, torch.Tensor])List[torch.Tensor][source]

Convenience function to set the parameter masks such that if abs(value) <= threshold the it a value is masked to 0

Parameters

threshold – the threshold at which all values will be masked to 0

set_param_masks_from_sparsity(sparsity: Union[float, List[float]])List[torch.Tensor][source]

Convenience function to set the parameter masks such that each masks have an amount of masked values such that the percentage equals the sparsity amount given. Masks the absolute smallest values up until sparsity is reached.

Parameters

sparsity – the decimal sparsity to set the param mask to can also be a list where each element is a sparsity for a tensor in the same position in the tensor list. If global sparsity is enabled, all values of the sparsity list must be the same

set_param_masks_from_weights()List[torch.Tensor][source]

Convenience function to set the parameter masks such that the mask is 1 if a parameter value is non zero and 0 otherwise, unless otherwise defined by this object’s mask_creator

property store_init

store the init weights in a separate variable that can be used and referenced later

Type

return

property store_unmasked

store the unmasked weights in a separate variable that can be used and referenced later

Type

return

property track_grad_mom

store the gradient updates to the parameter with a momentum variable must be in the range [0.0, 1.0), if set to 0.0 then will only keep most recent

Type

return

sparseml.pytorch.optim.modifier module

sparseml.pytorch.optim.modifier_as module

sparseml.pytorch.optim.modifier_epoch module

sparseml.pytorch.optim.modifier_lr module

sparseml.pytorch.optim.modifier_params module

sparseml.pytorch.optim.modifier_pruning module

sparseml.pytorch.optim.modifier_quantization module

sparseml.pytorch.optim.modifier_regularizer module

sparseml.pytorch.optim.optimizer module

Optimizer wrapper for enforcing Modifiers on the training process of a Module.

class sparseml.pytorch.optim.optimizer.ScheduledOptimizer(optimizer: torch.optim.optimizer.Optimizer, module: torch.nn.modules.module.Module, manager: sparseml.pytorch.optim.manager.ScheduledModifierManager, steps_per_epoch: int, loggers: Optional[List[sparseml.pytorch.utils.logger.BaseLogger]] = None, initialize_kwargs: Optional[Dict[str, Any]] = None)[source]

Bases: torch.optim.optimizer.Optimizer

An optimizer wrapper to handle applying modifiers according to their schedule to both the passed in optimizer and the module.

Overrides the step() function so that this method can call before and after on the modifiers to apply appropriate modifications to both the optimizer and the module.

The epoch_start and epoch_end are based on how many steps have been taken along with the steps_per_epoch.

Lifecycle:
- training cycle
- zero_grad
- loss_update
- modifiers.loss_update
- step
- modifiers.update
- modifiers.optimizer_pre_step
- optimizer.step
- modifiers.optimizers_post_step
Parameters
  • module – module to modify

  • optimizer – optimizer to modify

  • manager – the manager or list of managers used to apply modifications

  • steps_per_epoch – the number of steps or batches in each epoch, not strictly required and can be set to -1. used to calculate decimals within the epoch, when not using can result in irregularities

  • loggers – logger manager to log important info to within the modifiers; ex tensorboard or to the console

  • initialize_kwargs – key word arguments and values to be passed to the recipe manager initialize function

adjust_current_step(epoch: int, step: int)[source]

Adjust the current step for the manager’s schedule to the given epoch and step.

Parameters
  • epoch – the epoch to set the current global step to match

  • step – the step (batch) within the epoch to set the current global step to match

property learning_rate

convenience function to get the first learning rate for any of the param groups in the optimizer

Type

return

load_manager_state_dict(state_dict)[source]
loss_update(loss: torch.Tensor)torch.Tensor[source]

Optional call to update modifiers based on the calculated loss. Not needed unless one or more of the modifier is using the loss to make a modification or is modifying the loss itself.

Parameters

loss – the calculated loss after running a forward pass and loss_fn

Returns

the modified loss tensor

property manager

The ScheduledModifierManager for this optimizer

Type

return

manager_state_dict()[source]
step(closure=None)[source]

Called to perform a step on the optimizer activation normal. Updates the current epoch based on the step count. Calls into modifiers before the step happens. Calls into modifiers after the step happens.

Parameters

closure – optional closure passed into the contained optimizer for the step

sparseml.pytorch.optim.sensitivity_as module

Sensitivity analysis implementations for increasing activation sparsity by using FATReLU

class sparseml.pytorch.optim.sensitivity_as.ASLayerTracker(layer: torch.nn.modules.module.Module, track_input: bool = False, track_output: bool = False, input_func: Union[None, Callable] = None, output_func: Union[None, Callable] = None)[source]

Bases: object

An implementation for tracking activation sparsity properties for a module.

Parameters
  • layer – the module to track activation sparsity for

  • track_input – track the input sparsity for the module

  • track_output – track the output sparsity for the module

  • input_func – the function to call on input to the layer and receives the input tensor

  • output_func – the function to call on output to the layer and receives the output tensor

clear()[source]

Clear out current results for the model

disable()[source]

Disable the forward hooks for the layer

enable()[source]

Enable the forward hooks to the layer

property tracked_input

the current tracked input results

Type

return

property tracked_output

the current tracked output results

Type

return

class sparseml.pytorch.optim.sensitivity_as.LayerBoostResults(name: str, threshold: float, boosted_as: torch.Tensor, boosted_loss: sparseml.pytorch.utils.module.ModuleRunResults, baseline_as: torch.Tensor, baseline_loss: sparseml.pytorch.utils.module.ModuleRunResults)[source]

Bases: object

Results for a specific threshold set in a FATReLU layer.

Parameters
  • name – the name of the layer the results are for

  • threshold – the threshold used in the FATReLU layer

  • boosted_as – the measured activation sparsity after threshold is applied

  • boosted_loss – the measured loss after threshold is applied

  • baseline_as – the measured activation sparsity before threshold is applied

  • baseline_loss – the measured loss before threshold is applied

property baseline_as

the measured activation sparsity before threshold is applied

Type

return

property baseline_loss

the measured loss before threshold is applied

Type

return

property boosted_as

the measured activation sparsity after threshold is applied

Type

return

property boosted_loss

the measured loss after threshold is applied

Type

return

property name

the name of the layer the results are for

Type

return

property threshold

the threshold used in the FATReLU layer

Type

return

class sparseml.pytorch.optim.sensitivity_as.ModuleASOneShootBooster(module: torch.nn.modules.module.Module, device: str, dataset: torch.utils.data.dataset.Dataset, batch_size: int, loss: sparseml.pytorch.utils.loss.LossWrapper, data_loader_kwargs: Dict)[source]

Bases: object

Implementation class for boosting the activation sparsity in a given module using FATReLUs. Programmatically goes through and figures out the best thresholds to limit loss based on provided parameters.

Parameters
  • module – the module to boost

  • device – the device to run the analysis on; ex [cpu, cuda, cuda:1]

  • dataset – the dataset used to evaluate the boosting on

  • batch_size – the batch size to run through the module in test mode

  • loss – the loss function to use for calculations

  • data_loader_kwargs – any keyword arguments to supply to a the DataLoader constructor

run_layers(layers: List[str], max_target_metric_loss: float, metric_key: str, metric_increases: bool, precision: float = 0.001)Dict[str, sparseml.pytorch.optim.sensitivity_as.LayerBoostResults][source]

Run the booster for the specified layers.

Parameters
  • layers – names of the layers to run boosting on

  • max_target_metric_loss – the max loss in the target metric that can happen while boosting

  • metric_key – the name of the metric to evaluate while boosting; ex: [__loss__, top1acc, top5acc]. Must exist in the LossWrapper

  • metric_increases – True if the metric increases for worse loss such as in a CrossEntropyLoss, False if the metric decreases for worse such as in accuracy

  • precision – the precision to check the results to. Larger values here will give less precise results but won’t take as long

Returns

The results for the boosting

sparseml.pytorch.optim.sensitivity_lr module

Sensitivity analysis implementations for learning rate on Modules against loss funcs.

sparseml.pytorch.optim.sensitivity_lr.default_exponential_check_lrs(init_lr: float = 1e-06, final_lr: float = 0.5, lr_mult: float = 1.1)Tuple[float, ][source]

Get the default learning rates to check between init_lr and final_lr.

Parameters
  • init_lr – the initial learning rate in the returned list

  • final_lr – the final learning rate in the returned list

  • lr_mult – the multiplier increase for each step between init_lr and final_lr

Returns

the list of created lrs that increase exponentially between init_lr and final_lr according to lr_mult

sparseml.pytorch.optim.sensitivity_lr.lr_loss_sensitivity(module: torch.nn.modules.module.Module, data: torch.utils.data.dataloader.DataLoader, loss: Union[sparseml.pytorch.utils.loss.LossWrapper, Callable[[Any, Any], torch.Tensor]], optim: torch.optim.optimizer.Optimizer, device: str, steps_per_measurement: int, check_lrs: Union[List[float], Tuple[float, ]] = (1e-06, 1.1e-06, 1.21e-06, 1.3310000000000003e-06, 1.4641000000000003e-06, 1.6105100000000006e-06, 1.7715610000000007e-06, 1.948717100000001e-06, 2.1435888100000012e-06, 2.3579476910000015e-06, 2.5937424601000017e-06, 2.853116706110002e-06, 3.1384283767210024e-06, 3.452271214393103e-06, 3.7974983358324136e-06, 4.177248169415655e-06, 4.594972986357221e-06, 5.0544702849929435e-06, 5.559917313492238e-06, 6.115909044841462e-06, 6.727499949325609e-06, 7.40024994425817e-06, 8.140274938683989e-06, 8.954302432552388e-06, 9.849732675807628e-06, 1.0834705943388392e-05, 1.1918176537727232e-05, 1.3109994191499957e-05, 1.4420993610649954e-05, 1.586309297171495e-05, 1.7449402268886447e-05, 1.9194342495775094e-05, 2.1113776745352607e-05, 2.322515441988787e-05, 2.554766986187666e-05, 2.8102436848064327e-05, 3.091268053287076e-05, 3.4003948586157844e-05, 3.7404343444773634e-05, 4.1144777789251e-05, 4.52592555681761e-05, 4.978518112499371e-05, 5.4763699237493086e-05, 6.02400691612424e-05, 6.626407607736664e-05, 7.289048368510331e-05, 8.017953205361364e-05, 8.819748525897502e-05, 9.701723378487253e-05, 0.00010671895716335979, 0.00011739085287969578, 0.00012912993816766537, 0.00014204293198443192, 0.00015624722518287512, 0.00017187194770116264, 0.00018905914247127894, 0.00020796505671840686, 0.00022876156239024756, 0.00025163771862927233, 0.0002768014904921996, 0.0003044816395414196, 0.00033492980349556157, 0.00036842278384511775, 0.0004052650622296296, 0.0004457915684525926, 0.0004903707252978519, 0.0005394077978276372, 0.000593348577610401, 0.0006526834353714411, 0.0007179517789085853, 0.0007897469567994438, 0.0008687216524793883, 0.0009555938177273272, 0.00105115319950006, 0.001156268519450066, 0.0012718953713950728, 0.0013990849085345801, 0.0015389933993880383, 0.0016928927393268422, 0.0018621820132595267, 0.0020484002145854797, 0.0022532402360440277, 0.0024785642596484307, 0.002726420685613274, 0.0029990627541746015, 0.003298969029592062, 0.0036288659325512686, 0.003991752525806396, 0.0043909277783870364, 0.004830020556225741, 0.005313022611848316, 0.005844324873033148, 0.006428757360336463, 0.00707163309637011, 0.007778796406007121, 0.008556676046607835, 0.009412343651268619, 0.010353578016395481, 0.01138893581803503, 0.012527829399838533, 0.013780612339822387, 0.015158673573804626, 0.01667454093118509, 0.0183419950243036, 0.020176194526733963, 0.02219381397940736, 0.0244131953773481, 0.02685451491508291, 0.029539966406591206, 0.03249396304725033, 0.03574335935197537, 0.03931769528717291, 0.043249464815890204, 0.047574411297479226, 0.052331852427227155, 0.05756503766994987, 0.06332154143694486, 0.06965369558063936, 0.0766190651387033, 0.08428097165257363, 0.092709068817831, 0.10197997569961412, 0.11217797326957554, 0.1233957705965331, 0.13573534765618642, 0.14930888242180507, 0.1642397706639856, 0.18066374773038418, 0.19873012250342262, 0.2186031347537649, 0.2404634482291414, 0.2645097930520556, 0.29096077235726114, 0.3200568495929873, 0.3520625345522861, 0.38726878800751474, 0.4259956668082662, 0.4685952334890929, 0.5154547568380022, 0.5), loss_key: str = '__loss__', trainer_run_funcs: Optional[sparseml.pytorch.utils.module.ModuleRunFuncs] = None, trainer_loggers: Optional[List[sparseml.pytorch.utils.logger.BaseLogger]] = None, show_progress: bool = True)sparseml.optim.sensitivity.LRLossSensitivityAnalysis[source]

Implementation for handling running sensitivity analysis for learning rates on modules.

Parameters
  • module – the module to run the learning rate sensitivity analysis over, it is expected to already be on the correct device

  • data – the data to run through the module for calculating the sensitivity analysis

  • loss – the loss function to use for the sensitivity analysis

  • optim – the optimizer to run the sensitivity analysis with

  • device – the device to run the analysis on; ex: cpu, cuda. module must already be on that device, this is used to place then data on that same device.

  • steps_per_measurement – the number of batches to run through for the analysis at each LR

  • check_lrs – the learning rates to check for analysis (will sort them small to large before running)

  • loss_key – the key for the loss function to track in the returned dict

  • trainer_run_funcs – override functions for ModuleTrainer class

  • trainer_loggers – loggers to log data to while running the analysis

  • show_progress – track progress of the runs if True

Returns

a list of tuples containing the analyzed learning rate at 0 and the ModuleRunResults in 1, ModuleRunResults being a collection of all the batch results run through the module at that LR

sparseml.pytorch.optim.sensitivity_pruning module

Sensitivity analysis implementations for kernel sparsity on Modules against loss funcs.

sparseml.pytorch.optim.sensitivity_pruning.model_prunability_magnitude(module: torch.nn.modules.module.Module)[source]

Calculate the approximate sensitivity for an overall model. Range of the values are not scaled to anything, so must be taken in context with other known models.

Parameters

module – the model to calculate the sensitivity for

Returns

the approximated sensitivity

sparseml.pytorch.optim.sensitivity_pruning.pruning_loss_sens_magnitude(module: torch.nn.modules.module.Module, sparsity_levels: Union[List[float], Tuple[float, ]] = (0.0, 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.2, 0.21, 0.22, 0.23, 0.24, 0.25, 0.26, 0.27, 0.28, 0.29, 0.3, 0.31, 0.32, 0.33, 0.34, 0.35, 0.36, 0.37, 0.38, 0.39, 0.4, 0.41, 0.42, 0.43, 0.44, 0.45, 0.46, 0.47, 0.48, 0.49, 0.5, 0.51, 0.52, 0.53, 0.54, 0.55, 0.56, 0.57, 0.58, 0.59, 0.6, 0.61, 0.62, 0.63, 0.64, 0.65, 0.66, 0.67, 0.68, 0.69, 0.7, 0.71, 0.72, 0.73, 0.74, 0.75, 0.76, 0.77, 0.78, 0.79, 0.8, 0.81, 0.82, 0.83, 0.84, 0.85, 0.86, 0.87, 0.88, 0.89, 0.9, 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99))sparseml.optim.sensitivity.PruningLossSensitivityAnalysis[source]

Approximated kernel sparsity (pruning) loss analysis for a given model. Returns the results for each prunable param (conv, linear) in the model.

Parameters
  • module – the model to calculate the sparse sensitivity analysis for

  • sparsity_levels – the sparsity levels to calculate the loss for for each param

Returns

the analysis results for the model

sparseml.pytorch.optim.sensitivity_pruning.pruning_loss_sens_one_shot(module: torch.nn.modules.module.Module, data: torch.utils.data.dataloader.DataLoader, loss: Union[sparseml.pytorch.utils.loss.LossWrapper, Callable[[Any, Any], torch.Tensor]], device: str, steps_per_measurement: int, sparsity_levels: List[int] = (0.0, 0.2, 0.4, 0.6, 0.7, 0.8, 0.85, 0.9, 0.95, 0.99), loss_key: str = '__loss__', tester_run_funcs: Optional[sparseml.pytorch.utils.module.ModuleRunFuncs] = None, tester_loggers: Optional[List[sparseml.pytorch.utils.logger.BaseLogger]] = None, show_progress: bool = True)sparseml.optim.sensitivity.PruningLossSensitivityAnalysis[source]

Run a one shot sensitivity analysis for kernel sparsity. It does not retrain, and instead puts the model to eval mode. Moves layer by layer to calculate the sensitivity analysis for each and resets the previously run layers. Note, by default it caches the data. This means it is not parallel for data loading and the first run can take longer. Subsequent sparsity checks for layers and levels will be much faster.

Parameters
  • module – the module to run the kernel sparsity sensitivity analysis over will extract all prunable layers out

  • data – the data to run through the module for calculating the sensitivity analysis

  • loss – the loss function to use for the sensitivity analysis

  • device – the device to run the analysis on; ex: cpu, cuda

  • steps_per_measurement – the number of samples or items to take for each measurement at each sparsity lev

  • sparsity_levels – the sparsity levels to check for each layer to calculate sensitivity

  • loss_key – the key for the loss function to track in the returned dict

  • tester_run_funcs – override functions to use in the ModuleTester that runs

  • tester_loggers – loggers to log data to while running the analysis

  • show_progress – track progress of the runs if True

Returns

the sensitivity results for every layer that is prunable

Module contents

Recalibration code for the PyTorch framework. Handles things like model pruning and increasing activation sparsity.