sparseml.pytorch.optim package

Submodules

sparseml.pytorch.optim.analyzer_as module

Code related to analyzing activation sparsity within PyTorch neural networks. More information can be found in the paper here .

class sparseml.pytorch.optim.analyzer_as. ASResultType ( value ) [source]

Bases: enum.Enum

Result type to track for activation sparsity.

inputs_sample = 'inputs_sample'
inputs_sparsity = 'inputs_sparsity'
outputs_sample = 'outputs_sample'
outputs_sparsity = 'outputs_sparsity'
class sparseml.pytorch.optim.analyzer_as. ModuleASAnalyzer ( module : torch.nn.modules.module.Module , dim : Union [ None , int , Tuple [ int , ] ] = None , track_inputs_sparsity : bool = False , track_outputs_sparsity : bool = False , inputs_sample_size : int = 0 , outputs_sample_size : int = 0 , enabled : bool = True ) [source]

Bases: object

An analyzer implementation used to monitor the activation sparsity with a module. Generally used to monitor an individual layer.

Parameters
  • module – The module to analyze activation sparsity for

  • dim – Any dims within the tensor such as across batch, channel, etc. Ex: 0 for batch, 1 for channel, [0, 1] for batch and channel

  • track_inputs_sparsity – True to track the input sparsity to the module, False otherwise

  • track_outputs_sparsity – True to track the output sparsity to the module, False otherwise

  • inputs_sample_size – The number of samples to grab from the input tensor on each forward pass. If <= 0, then will not sample any values.

  • outputs_sample_size – The number of samples to grab from the output tensor on each forward pass. If <= 0, then will not sample any values.

  • enabled – True to enable the hooks for analyzing and actively track, False to disable and not track

static analyze_layers ( module : torch.nn.modules.module.Module , layers : List [ str ] , dim : Union [ None , int , Tuple [ int , ] ] = None , track_inputs_sparsity : bool = False , track_outputs_sparsity : bool = False , inputs_sample_size : int = 0 , outputs_sample_size : int = 0 , enabled : bool = True ) [source]
Parameters
  • module – the module to analyze multiple layers activation sparsity in

  • layers – the names of the layers to analyze (from module.named_modules())

  • dim – Any dims within the tensor such as across batch, channel, etc. Ex: 0 for batch, 1 for channel, [0, 1] for batch and channel

  • track_inputs_sparsity – True to track the input sparsity to the module, False otherwise

  • track_outputs_sparsity – True to track the output sparsity to the module, False otherwise

  • inputs_sample_size – The number of samples to grab from the input tensor on each forward pass. If <= 0, then will not sample any values.

  • outputs_sample_size – The number of samples to grab from the output tensor on each forward pass. If <= 0, then will not sample any values.

  • enabled – True to enable the hooks for analyzing and actively track, False to disable and not track

Returns

a list of the created analyzers, matches the ordering in layers

clear ( specific_result_type : Union [ None , sparseml.pytorch.optim.analyzer_as.ASResultType ] = None ) [source]
property dim
disable ( ) [source]
enable ( ) [source]
property enabled
property inputs_sample
property inputs_sample_max
property inputs_sample_mean
property inputs_sample_min
property inputs_sample_size
property inputs_sample_std
property inputs_sparsity
property inputs_sparsity_max
property inputs_sparsity_mean
property inputs_sparsity_min
property inputs_sparsity_std
property module
property outputs_sample
property outputs_sample_max
property outputs_sample_mean
property outputs_sample_min
property outputs_sample_size
property outputs_sample_std
property outputs_sparsity
property outputs_sparsity_max
property outputs_sparsity_mean
property outputs_sparsity_min
property outputs_sparsity_std
results ( result_type : sparseml.pytorch.optim.analyzer_as.ASResultType ) List [ torch.Tensor ] [source]
results_max ( result_type : sparseml.pytorch.optim.analyzer_as.ASResultType ) torch.Tensor [source]
results_mean ( result_type : sparseml.pytorch.optim.analyzer_as.ASResultType ) torch.Tensor [source]
results_min ( result_type : sparseml.pytorch.optim.analyzer_as.ASResultType ) torch.Tensor [source]
results_std ( result_type : sparseml.pytorch.optim.analyzer_as.ASResultType ) torch.Tensor [source]
property track_inputs_sparsity
property track_outputs_sparsity

sparseml.pytorch.optim.analyzer_module module

Code related to monitoring, analyzing, and reporting info for Modules in PyTorch. Records things like FLOPS, input and output shapes, kernel shapes, etc.

class sparseml.pytorch.optim.analyzer_module. ModuleAnalyzer ( module : torch.nn.modules.module.Module , enabled : bool = False ) [source]

Bases: object

An analyzer implementation for monitoring the execution profile and graph of a Module in PyTorch.

Parameters
  • module – the module to analyze

  • enabled – True to enable the hooks for analyzing and actively track, False to disable and not track

property enabled

True if enabled and the hooks for analyzing are active, False otherwise

Type

return

ks_layer_descs ( ) List [ sparseml.optim.analyzer.AnalyzedLayerDesc ] [source]

Get the descriptions for all layers in the module that support kernel sparsity (model pruning). Ex: all convolutions and linear layers.

Returns

a list of descriptions for all layers in the module that support ks

layer_desc ( name : Optional [ str ] = None ) sparseml.optim.analyzer.AnalyzedLayerDesc [source]

Get a specific layer’s description within the Module. Set to None to get the overall Module’s description.

Parameters

name – name of the layer to get a description for, None for an overall description

Returns

the analyzed layer description for the given name

property module

The module that is being actively analyzed

Type

return

sparseml.pytorch.optim.analyzer_pruning module

Code related to monitoring, analyzing, and reporting the kernel sparsity (model pruning) for a model’s layers and params. More info on kernel sparsity can be found here <https://arxiv.org/abs/1902.09574> __.

class sparseml.pytorch.optim.analyzer_pruning. ModulePruningAnalyzer ( module : torch.nn.modules.module.Module , name : str , param_name : str = 'weight' ) [source]

Bases: object

An analyzer implementation monitoring the kernel sparsity of a given param in a module.

Parameters
  • module – the module containing the param to analyze the sparsity for

  • name – name of the layer, used for tracking

  • param_name – name of the parameter to analyze the sparsity for, defaults to weight

static analyze_layers ( module : torch.nn.modules.module.Module , layers : List [ str ] , param_name : str = 'weight' ) [source]
Parameters
  • module – the module to create multiple analyzers for

  • layers – the names of the layers to create analyzer for that are in the module

  • param_name – the name of the param to monitor within each layer

Returns

a list of analyzers, one for each layer passed in and in the same order

property module

the module containing the param to analyze the sparsity for

Type

return

property name

name of the layer, used for tracking

Type

return

property param

the parameter that is being monitored for kernel sparsity

Type

return

property param_name

name of the parameter to analyze the sparsity for, defaults to weight

Type

return

property param_sparsity

the sparsity of the contained parameter (how many zeros are in it)

Type

return

param_sparsity_dim ( dim : Union [ None , int , Tuple [ int , ] ] = None ) torch.Tensor [source]
Parameters

dim – a dimension(s) to calculate the sparsity over, ex over channels

Returns

the sparsity of the contained parameter structured according to the dim passed in

property tag

combines the layer name and param name in to a single string separated by a period

Type

return

sparseml.pytorch.optim.manager module

Contains base code related to modifier managers: modifier managers handle grouping modifiers and running them together. Also handles loading modifiers from yaml files

class sparseml.pytorch.optim.manager. RecipeManagerStepWrapper ( wrap : Any , optimizer : torch.optim.optimizer.Optimizer , module : torch.nn.modules.module.Module , manager : Any , epoch : float , steps_per_epoch : int ) [source]

Bases: object

A wrapper class to handle wrapping an optimizer or optimizer like object and override the step function. The override calls into the ScheduledModifierManager when appropriate and enabled and then calls step() as usual on the function with the original arguments. All original attributes and methods are forwarded to the wrapped object so this class can be a direct substitute for it.

Parameters
  • wrap – The object to wrap the step function and properties for.

  • optimizer – The optimizer used in the training process.

  • module – The model/module used in the training process.

  • manager – The manager to forward lifecycle calls into such as step.

  • epoch – The epoch to start the modifying process at.

  • steps_per_epoch – The number of optimizer steps (batches) in each epoch.

emulated_step ( ) [source]

Emulated step function to be called in place of step when the number of steps_per_epoch vary across epochs. The emulated function should be called to keep the steps_per_epoch thee same. Does not call into the step function for the wrapped object, but does call into the manager to increment the steps.

loss_update ( loss : torch.Tensor ) torch.Tensor [source]

Optional call to update modifiers based on the calculated loss. Not needed unless one or more of the modifier is using the loss to make a modification or is modifying the loss itself.

Parameters

loss – the calculated loss after running a forward pass and loss_fn

Returns

the modified loss tensor

step ( * args , ** kwargs ) [source]

Override for the step function. Calls into the base step function with the args and kwargs.

Parameters
  • args – Any args to pass to the wrapped objects step function.

  • kwargs – Any kwargs to pass to the wrapped objects step function.

Returns

The return, if any, from the wrapped objects step function

property wrapped

The object to wrap the step function and properties for.

Type

return

property wrapped_epoch

The current epoch the wrapped object is at.

Type

return

property wrapped_manager

The manager to forward lifecycle calls into such as step.

Type

return

property wrapped_module

The model/module used in the training process.

Type

return

property wrapped_optimizer

The optimizer used in the training process.

Type

return

property wrapped_steps

The current number of steps that have been called for the wrapped object.

Type

return

property wrapped_steps_per_epoch

The number of optimizer steps (batches) in each epoch.

Type

return

class sparseml.pytorch.optim.manager. ScheduledModifierManager ( modifiers : List [ sparseml.pytorch.optim.modifier.ScheduledModifier ] ) [source]

Bases: sparseml.optim.manager.BaseManager , sparseml.pytorch.optim.modifier.Modifier

The base modifier manager, handles managing multiple ScheduledModifers.

Lifecycle:
- initialize
- initialize_loggers
- modify
- finalize
Parameters

modifiers – the modifiers to wrap

finalize ( module : Optional [ torch.nn.modules.module.Module ] = None , reset_loggers : bool = True , ** kwargs ) [source]

Handles any finalization of the modifier for the given model/module. Applies any remaining logic and cleans up any hooks or attachments to the model.

Parameters
  • module – The model/module to finalize the modifier for. Marked optional so state can still be cleaned up on delete, but generally should always be passed in.

  • reset_loggers – True to remove any currently attached loggers (default), False to keep the loggers attached.

  • kwargs – Optional kwargs to support specific arguments for individual modifiers.

static from_yaml ( file_path : Union [ str , sparsezoo.objects.recipe.Recipe ] , add_modifiers : Optional [ List [ sparseml.pytorch.optim.modifier.Modifier ] ] = None , ** recipe_variables ) [source]

Convenience function used to create the manager of multiple modifiers from a recipe file.

Parameters
  • file_path – the path to the recipe file to load the modifier from, or a SparseZoo model stub to load a recipe for a model stored in SparseZoo. SparseZoo stubs should be preceded by ‘zoo:’, and can contain an optional ‘?recipe_type=<type>’ parameter. Can also be a SparseZoo Recipe object. i.e. ‘/path/to/local/recipe.yaml’, ‘zoo:model/stub/path’, ‘zoo:model/stub/path?recipe_type=transfer’

  • add_modifiers – additional modifiers that should be added to the returned manager alongside the ones loaded from the recipe file

  • recipe_variables – additional variable values to override the recipe with (i.e. num_epochs, init_lr)

Returns

ScheduledModifierManager() created from the recipe file

initialize ( module : torch.nn.modules.module.Module , epoch : float = 0 , loggers : Optional [ List [ sparseml.pytorch.utils.logger.BaseLogger ] ] = None , ** kwargs ) [source]

Handles any initialization of the manager for the given model/module. epoch and steps_per_epoch can optionally be passed in to initialize the manager and module at a specific point in the training process. If loggers is not None, will additionally call initialize_loggers.

Parameters
  • module – the PyTorch model/module to modify

  • epoch – The epoch to initialize the manager and module at. Defaults to 0 (start of the training process)

  • loggers – Optional list of loggers to log the modification process to

  • kwargs – Optional kwargs to support specific arguments for individual modifiers.

initialize_loggers ( loggers : Union [ None , List [ sparseml.pytorch.utils.logger.BaseLogger ] ] ) [source]

Handles initializing and setting up the loggers for the contained modifiers.

Parameters

loggers – the loggers to setup this manager with for logging important info and milestones to

load_state_dict ( state_dict : Dict [ str , Dict ] , strict : bool = True ) [source]

Loads the given state dict into this manager. All modifiers that match will be loaded. If any are missing or extra and strict=True, then will raise a KeyError

Parameters
  • state_dict – dictionary object as generated by this object’s state_dict function

  • strict – True to raise a KeyError for any missing or extra information in the state dict, False to ignore

Raises

IndexError – If any keys in the state dict do not correspond to a valid index for this manager and strict=True

loss_update ( loss : torch.Tensor , module : torch.nn.modules.module.Module , optimizer : torch.optim.optimizer.Optimizer , epoch : float , steps_per_epoch : int , ** kwargs ) torch.Tensor [source]

Optional call that can be made on the optimizer to update the contained modifiers once loss has been calculated

Parameters
  • loss – The calculated loss tensor

  • module – module to modify

  • optimizer – optimizer to modify

  • epoch – current epoch and progress within the current epoch

  • steps_per_epoch – number of steps taken within each epoch (calculate batch number using this and epoch)

Returns

the modified loss tensor

modify ( module : torch.nn.modules.module.Module , optimizer : torch.optim.optimizer.Optimizer , steps_per_epoch : int , wrap_optim : Optional [ Any ] = None , epoch : Optional [ float ] = None , allow_parallel_module : bool = True ) sparseml.pytorch.optim.manager.RecipeManagerStepWrapper [source]

Modify the given module and optimizer for training aware algorithms such as pruning and quantization. Initialize must be called first. After training is complete, finalize should be called.

Parameters
  • module – The model/module to modify

  • optimizer – The optimizer to modify

  • steps_per_epoch – The number of optimizer steps (batches) in each epoch

  • wrap_optim – Optional object to wrap instead of the optimizer. Useful for cases like amp (fp16 training) where a it should be wrapped in place of the original optimizer since it doesn’t always call into the optimizer.step() function.

  • epoch – Optional epoch that can be passed in to start modifying at. Defaults to the epoch that was supplied to the initialize function.

  • allow_parallel_module – if False, a DataParallel or DistributedDataParallel module passed to this function will be unwrapped to its base module during recipe initialization by referencing module.module. This is useful so a recipe may reference the base module parameters instead of the wrapped distributed ones. Set to True to not unwrap the distributed module. Default is True

Returns

A wrapped optimizer object. The wrapped object makes all the original properties for the wrapped object available so it can be used without any additional code changes.

optimizer_post_step ( module : torch.nn.modules.module.Module , optimizer : torch.optim.optimizer.Optimizer , epoch : float , steps_per_epoch : int ) [source]

Called after the optimizer step happens and weights have updated Calls into the contained modifiers

Parameters
  • module – module to modify

  • optimizer – optimizer to modify

  • epoch – current epoch and progress within the current epoch

  • steps_per_epoch – number of steps taken within each epoch (calculate batch number using this and epoch)

optimizer_pre_step ( module : torch.nn.modules.module.Module , optimizer : torch.optim.optimizer.Optimizer , epoch : float , steps_per_epoch : int ) [source]

Called before the optimizer step happens (after backward has been called, before optimizer.step) Calls into the contained modifiers

Parameters
  • module – module to modify

  • optimizer – optimizer to modify

  • epoch – current epoch and progress within the current epoch

  • steps_per_epoch – number of steps taken within each epoch (calculate batch number using this and epoch)

state_dict ( ) Dict [ str , Dict ] [source]
Returns

Dictionary to store any state variables for this manager. Includes all modifiers nested under this manager as sub keys in the dict. Only modifiers that a non empty state dict are included.

update ( module : torch.nn.modules.module.Module , optimizer : torch.optim.optimizer.Optimizer , epoch : float , steps_per_epoch : int , log_updates : bool = True ) [source]

Handles updating the contained modifiers’ states, module, or optimizer Only calls scheduled_update on the each modifier if modifier.update_ready()

Parameters
  • module – module to modify

  • optimizer – optimizer to modify

  • epoch – current epoch and progress within the current epoch

  • steps_per_epoch – number of steps taken within each epoch (calculate batch number using this and epoch)

  • log_updates – True to log the updates for each modifier to the loggers, False to skip logging

sparseml.pytorch.optim.mask_creator_pruning module

Classes for defining sparsity masks based on model parameters.

class sparseml.pytorch.optim.mask_creator_pruning. BlockPruningMaskCreator ( block_shape : List [ int ] , grouping_fn_name : str = 'mean' ) [source]

Bases: sparseml.pytorch.optim.mask_creator_pruning.GroupedPruningMaskCreator

Structured sparsity mask creator that groups the input tensor into blocks of shape block_shape.

Parameters
  • block_shape – The shape in and out channel should take in blocks. Should be a list of exactly two integers that divide the input tensors evenly on the channel dimensions. -1 for a dimension blocks across the entire dimension

  • grouping_fn_name – The name of the torch grouping function to reduce dimensions by

group_tensor ( tensor : torch.Tensor ) torch.Tensor [source]
Parameters

tensor – The tensor to transform

Returns

The mean values of the tensor grouped by blocks of shape self._block_shape

class sparseml.pytorch.optim.mask_creator_pruning. DimensionSparsityMaskCreator ( dim : Union [ str , int , List [ int ] ] , grouping_fn_name : str = 'mean' ) [source]

Bases: sparseml.pytorch.optim.mask_creator_pruning.GroupedPruningMaskCreator

Structured sparsity mask creator that groups sparsity blocks by the given dimension(s)

Parameters
  • dim – The index or list of indices of dimensions to group the mask by or the type of dims to prune ([‘channel’, ‘filter’])

  • grouping_fn_name – The name of the torch grouping function to reduce dimensions by

create_sparsity_masks ( tensors : List [ torch.Tensor ] , sparsity : Union [ float , List [ float ] ] , global_sparsity : bool = False ) List [ torch.Tensor ] [source]
Parameters
  • tensors – list of tensors to calculate masks from based on their contained values

  • sparsity – the desired sparsity to reach within the mask (decimal fraction of zeros) can also be a list where each element is a sparsity for a tensor in the same position in the tensor list, If global sparsity is enabled, all values of the sparsity list must be the same

  • global_sparsity – do not set True, unsupported for DimensionSparsityMaskCreator

Returns

list of masks (0.0 for values that are masked, 1.0 for values that are unmasked) calculated from the tensors such that the desired number of zeros matches the sparsity and all values mapped to the same group have the same value

group_tensor ( tensor : torch.Tensor ) torch.Tensor [source]
Parameters

tensor – The tensor to transform

Returns

The mean values of the tensor grouped by the dimension(s) in self._dim

class sparseml.pytorch.optim.mask_creator_pruning. GroupedPruningMaskCreator [source]

Bases: sparseml.pytorch.optim.mask_creator_pruning.UnstructuredPruningMaskCreator

Abstract class for a sparsity mask creator that structures masks according to grouping functions. Subclasses should implement group_tensor and _map_mask_to_tensor

create_sparsity_masks ( tensors : List [ torch.Tensor ] , sparsity : Union [ float , List [ float ] ] , global_sparsity : bool = False ) List [ torch.Tensor ] [source]
Parameters
  • tensors – list of tensors to calculate masks from based on their contained values

  • sparsity – the desired sparsity to reach within the mask (decimal fraction of zeros) can also be a list where each element is a sparsity for a tensor in the same position in the tensor list. If global sparsity is enabled, all values of the sparsity list must be the same

  • global_sparsity – if True, sparsity masks will be created such that the average sparsity across all given tensors is the target sparsity with the lowest global values masked. If False, each tensor will be masked to the target sparsity ranking values within each individual tensor. Default is False

Returns

list of masks (0.0 for values that are masked, 1.0 for values that are unmasked) calculated from the tensors such that the desired number of zeros matches the sparsity and all values mapped to the same group have the same value

create_sparsity_masks_from_tensor ( tensors : List [ torch.Tensor ] ) List [ torch.Tensor ] [source]
Parameters

tensors – list of tensors to calculate masks based on their values

Returns

list of masks derived from the values of the tensors grouped by the group_tensor function.

create_sparsity_masks_from_threshold ( tensors : List [ torch.Tensor ] , threshold : Union [ float , torch.Tensor ] ) List [ torch.Tensor ] [source]
Parameters
  • tensors – list of tensors to calculate masks from based on their contained values

  • threshold – a threshold of group_tensor values to determine cutoff for sparsification

Returns

list of masks derived from the tensors and the grouped threshold

abstract group_tensor ( tensor : torch.Tensor ) torch.Tensor [source]
Parameters

tensor – The tensor to reduce in groups

Returns

The grouped tensor

static reduce_tensor ( tensor : torch.Tensor , dim : Union [ int , List [ int ] ] , reduce_fn_name : str , keepdim : bool = True ) torch.Tensor [source]
Parameters
  • tensor – the tensor to reduce

  • dim – dimension or list of dimension to reduce along

  • reduce_fn_name – function name to reduce tensor with. valid options are ‘mean’, ‘max’, ‘min’

  • keepdim – preserves the reduced dimension(s) in returned tensor shape as shape 1. default is True

Returns

Tensor reduced along the given dimension(s)

class sparseml.pytorch.optim.mask_creator_pruning. PruningMaskCreator [source]

Bases: abc.ABC

Base abstract class for a sparsity mask creator. Subclasses should define all methods for creating masks

abstract create_sparsity_masks ( tensors : List [ torch.Tensor ] , sparsity : Union [ float , List [ float ] ] , global_sparsity : bool = False ) List [ torch.Tensor ] [source]
Parameters
  • tensors – list of tensors to calculate a masks based on their contained values

  • sparsity – the desired sparsity to reach within the mask (decimal fraction of zeros) can also be a list where each element is a sparsity for a tensor in the same position in the tensor list. If global sparsity is enabled, all values of the sparsity list must be the same

  • global_sparsity – if True, sparsity masks will be created such that the average sparsity across all given tensors is the target sparsity with the lowest global values masked. If False, each tensor will be masked to the target sparsity ranking values within each individual tensor. Default is False

Returns

list of masks (0.0 for values that are masked, 1.0 for values that are unmasked) calculated from the tensors such that the desired number of zeros matches the sparsity.

create_sparsity_masks_from_tensor ( tensors : List [ torch.Tensor ] ) List [ torch.Tensor ] [source]
Parameters

tensors – list of tensors to calculate a masks based on their values

Returns

list of masks derived from each of the given tensors

abstract create_sparsity_masks_from_threshold ( tensors : List [ torch.Tensor ] , threshold : Union [ float , torch.Tensor ] ) List [ torch.Tensor ] [source]
Parameters
  • tensors – list of tensors to calculate a masks based on their contained values

  • threshold – a threshold to determine cutoff for sparsification

Returns

list of masks derived from each of the given tensors and the threshold

class sparseml.pytorch.optim.mask_creator_pruning. UnstructuredPruningMaskCreator [source]

Bases: sparseml.pytorch.optim.mask_creator_pruning.PruningMaskCreator

Class for creating unstructured sparsity masks. Masks will be created using unstructured sparsity by pruning weights ranked by their value. Each mask will correspond to the given tensor.

create_sparsity_masks ( tensors : List [ torch.Tensor ] , sparsity : Union [ float , List [ float ] ] , global_sparsity : bool = False ) List [ torch.Tensor ] [source]
Parameters
  • tensors – list of tensors to calculate a mask from based on their contained values

  • sparsity – the desired sparsity to reach within the mask (decimal fraction of zeros) can also be a list where each element is a sparsity for a tensor in the same position in the tensor list. If global sparsity is enabled, all values of the sparsity list must be the same

  • global_sparsity – if True, sparsity masks will be created such that the average sparsity across all given tensors is the target sparsity with the lowest global values masked. If False, each tensor will be masked to the target sparsity ranking values within each individual tensor. Default is False

Returns

list of masks (0.0 for values that are masked, 1.0 for values that are unmasked) calculated from the tensors such that the desired number of zeros matches the sparsity. If there are more zeros than the desired sparsity, zeros will be randomly chosen to match the target sparsity

create_sparsity_masks_from_threshold ( tensors : List [ torch.Tensor ] , threshold : Union [ float , torch.Tensor ] ) List [ torch.Tensor ] [source]
Parameters
  • tensors – list of tensors to calculate a masks based on their contained values

  • threshold – a threshold at which to mask values if they are less than it or equal

Returns

list of masks (0.0 for values that are masked, 1.0 for values that are unmasked) calculated from the tensors values <= threshold are masked, all others are unmasked

sparseml.pytorch.optim.mask_creator_pruning. load_mask_creator ( obj : Union [ str , Iterable [ int ] ] ) sparseml.pytorch.optim.mask_creator_pruning.PruningMaskCreator [source]
Parameters

obj – Formatted string or block shape iterable specifying SparsityMaskCreator object to return

Returns

SparsityMaskCreator object created from obj

sparseml.pytorch.optim.mask_pruning module

Code related to applying a mask onto a parameter to impose kernel sparsity, aka model pruning

class sparseml.pytorch.optim.mask_pruning. ModuleParamPruningMask ( layers : List [ torch.nn.modules.module.Module ] , param_names : Union [ str , List [ str ] ] = 'weight' , store_init : bool = False , store_unmasked : bool = False , track_grad_mom : float = - 1.0 , mask_creator : sparseml.pytorch.optim.mask_creator_pruning.PruningMaskCreator = unstructured , layer_names : Optional [ List [ str ] ] = None , global_sparsity : bool = False , score_type : Union [ str , sparseml.pytorch.utils.mfac_helpers.MFACOptions ] = 'magnitude' ) [source]

Bases: object

Mask to apply kernel sparsity (model pruning) to a specific parameter in a layer

Parameters
  • layers – the layers containing the parameters to mask

  • param_names – the names of the parameter to mask in each layer. If only one name is given, that name will be applied to all layers that this object masks. default is weight

  • store_init – store the init weights in a separate variable that can be used and referenced later

  • store_unmasked – store the unmasked weights in a separate variable that can be used and referenced later

  • track_grad_mom – store the gradient updates to the parameter with a momentum variable must be in the range [0.0, 1.0), if set to 0.0 then will only keep most recent

  • mask_creator – object to define sparisty mask creation, default is unstructured mask

  • layer_names – the name of the layers the parameters to mask are located in

  • global_sparsity – set True to enable global pruning. if True, when creating sparsity masks for a target sparsity sparsity masks will be created such that the average sparsity across all given layers is the target sparsity with the lowest global values masked. If False, each layer will be masked to the target sparsity ranking values within each individual tensor. Default is False

  • score_type – the method used to score parameters for masking, i.e. ‘magnitude’, ‘movement’. Can also be an MFACOptions object for M-FAC pruning. Default is ‘magnitude’

property allow_reintroduction

True if weight reintroduction is allowed

Type

return

apply ( param_idx : Optional [ int ] = None ) [source]

apply the current mask to the params tensor (zero out the desired values)

Parameters

param_idx – index of parameter to apply mask to. if not set, then masks will be applied to all parameters with available masks

disable_reintroduction ( ) [source]

if weight reintroduction is enabled (only during movement pruning), disables further weight reintroduction

property enabled

True if the parameter is currently being masked, False otherwise

Type

return

property global_sparsity

True if global pruning is enabled, False otherwise

Type

return

property layer_names

the names of the layers the parameter to mask is located in

Type

return

property layers

the layers containing the parameters to mask

Type

return

property mask_creator

SparsityMaskCreator object used to generate masks

Type

return

property names

the full names of the sparsity masks in the following format: <LAYER>.<PARAM>.sparsity_mask

Type

return

property param_masks

the current masks applied to each of the parameters

Type

return

property param_names

the names of the parameters to mask in the layers

Type

return

property params_data

the current tensors in each of the parameters

Type

return

property params_grad

the current gradient values for each parameter

Type

return

property params_init

the initial values of the parameters before being masked

Type

return

property params_unmasked

the unmasked values of the parameters (stores the last unmasked value before masking)

Type

return

pre_optim_step_update ( ) [source]

updates scores and buffers that depend on gradients. Should be called before Optimizer.step() to grab the latest gradients

pruning_end ( leave_enabled : bool ) [source]

Performs any cleanup necessary for this pruning method. Disables weight reintroduction if enabled and applies masks

Parameters

leave_enabled – if False, all pruning hooks will be destroyed. Default is True

reset ( ) [source]

resets the current stored tensors such that they will be on the same device and have the initial data

property score_type

the scoring method used to create masks (i.e. magnitude, movement)

Type

return

set_param_data ( value : torch.Tensor , param_idx : int ) [source]
Parameters
  • value – the value to set as the current tensor for the parameter, if enabled the mask will be applied

  • param_idx – index of the parameter in this object to set the data of

set_param_masks ( masks : List [ torch.Tensor ] ) [source]
Parameters

masks – the masks to set and apply as the current param tensors, if enabled mask is applied immediately

set_param_masks_from_abs_threshold ( threshold : Union [ float , torch.Tensor ] ) List [ torch.Tensor ] [source]

Convenience function to set the parameter masks such that if abs(value) <= threshold the it a value is masked to 0

Parameters

threshold – the threshold at which all values will be masked to 0

set_param_masks_from_sparsity ( sparsity : Union [ float , List [ float ] ] ) List [ torch.Tensor ] [source]

Convenience function to set the parameter masks such that each masks have an amount of masked values such that the percentage equals the sparsity amount given. Masks the absolute smallest values up until sparsity is reached.

Parameters

sparsity – the decimal sparsity to set the param mask to can also be a list where each element is a sparsity for a tensor in the same position in the tensor list. If global sparsity is enabled, all values of the sparsity list must be the same

set_param_masks_from_weights ( ) List [ torch.Tensor ] [source]

Convenience function to set the parameter masks such that the mask is 1 if a parameter value is non zero and 0 otherwise, unless otherwise defined by this object’s mask_creator.

property store_init

store the init weights in a separate variable that can be used and referenced later

Type

return

property store_unmasked

store the unmasked weights in a separate variable that can be used and referenced later

Type

return

property track_grad_mom

store the gradient updates to the parameter with a momentum variable must be in the range [0.0, 1.0), if set to 0.0 then will only keep most recent

Type

return

sparseml.pytorch.optim.modifier module

Contains base code related to modifiers: objects that modify some aspect of the training process for a model. For example, learning rate schedules or kernel sparsity (weight pruning) are implemented as modifiers.

class sparseml.pytorch.optim.modifier. Modifier ( log_types : Optional [ Union [ str , List [ str ] ] ] = None , ** kwargs ) [source]

Bases: sparseml.optim.modifier.BaseModifier

The base pytorch modifier implementation, all modifiers must inherit from this class. It defines common things needed for the lifecycle and implementation of a modifier.

Lifecycle:
- initialize
- initialize_loggers

training loop:
- update
- log_update
- loss_update
- optimizer_pre_step
- optimizer_post_step

- finalize
Parameters
  • log_types – The loggers that can be used by the modifier instance

  • kwargs – standard key word args, used to support multi inheritance

apply ( module : torch.nn.modules.module.Module , epoch : float = inf , loggers : Optional [ List [ sparseml.pytorch.utils.logger.BaseLogger ] ] = None , finalize : bool = True , ** kwargs ) [source]

Apply the modifier for a given model/module (one shot application). Calls into initialize(module, epoch, loggers, ** kwargs) and then finalize(module, ** kwargs) immediately after if finalize=True.

Parameters
  • module – the PyTorch model/module to modify

  • epoch – the epoch to apply the modifier at, defaults to math.inf (end)

  • loggers – Optional list of loggers to log the modification process to

  • finalize – True to invoke finalize after initialize, False otherwise. If training after one shot, set finalize=False to keep modifiers applied.

  • kwargs – Optional kwargs to support specific arguments for individual modifiers (passed to initialize and finalize).

finalize ( module : Optional [ torch.nn.modules.module.Module ] = None , reset_loggers : bool = True , ** kwargs ) [source]

Handles any finalization of the modifier for the given model/module. Applies any remaining logic and cleans up any hooks or attachments to the model.

Parameters
  • module – The model/module to finalize the modifier for. Marked optional so state can still be cleaned up on delete, but generally should always be passed in.

  • reset_loggers – True to remove any currently attached loggers (default), False to keep the loggers attached.

  • kwargs – Optional kwargs to support specific arguments for individual modifiers.

initialize ( module : torch.nn.modules.module.Module , epoch : float = 0 , loggers : Optional [ List [ sparseml.pytorch.utils.logger.BaseLogger ] ] = None , ** kwargs ) [source]

Handles any initialization of the modifier for the given model/module. epoch and steps_per_epoch can optionally be passed in to initialize the modifier and module at a specific point in the training process. If loggers is not None, will additionally call initialize_loggers.

Parameters
  • module – the PyTorch model/module to modify

  • epoch – The epoch to initialize the modifier and module at. Defaults to 0 (start of the training process)

  • loggers – Optional list of loggers to log the modification process to

  • kwargs – Optional kwargs to support specific arguments for individual modifiers.

initialize_loggers ( loggers : Union [ None , List [ sparseml.pytorch.utils.logger.BaseLogger ] ] ) [source]

Handles initializing and setting up the loggers for the modifier.

Parameters

loggers – the loggers to setup this modifier with for logging important info and milestones to

static load_list ( yaml_str : str ) [source]
Parameters

yaml_str – a string representation of the yaml syntax to load modifiers from

Returns

the loaded modifiers list

static load_obj ( yaml_str : str ) [source]
Parameters

yaml_str – a string representation of the yaml syntax to load a modifier from

Returns

the loaded modifier object

load_state_dict ( state_dict : Dict [ str , Dict ] , strict : bool = True ) [source]

Loads the given state dict into this modifier

Parameters
  • state_dict – dictionary object as generated by this object’s state_dict function

  • strict – True to raise a KeyError for any missing or extra information in the state dict, False to ignore

Raises

IndexError – If any keys in the state dict do not correspond to a valid index for this manager and strict=True

log_update ( module : torch.nn.modules.module.Module , optimizer : torch.optim.optimizer.Optimizer , epoch : float , steps_per_epoch : int ) [source]

Handles logging updates for the modifier for better tracking and visualization. Should be overwritten for logging.

Parameters
  • module – module to modify

  • optimizer – optimizer to modify

  • epoch – current epoch and progress within the current epoch

  • steps_per_epoch – number of steps taken within each epoch (calculate batch number using this and epoch)

loggers
loggers_initialized
loss_update ( loss : torch.Tensor , module : torch.nn.modules.module.Module , optimizer : torch.optim.optimizer.Optimizer , epoch : float , steps_per_epoch : int , ** kwargs ) [source]

Optional call that can be made on the optimizer to update the modifiers once the loss has been calculated. Called independent of if the modifier is currently active or not.

Parameters
  • loss – The calculated loss tensor

  • module – module to modify

  • optimizer – optimizer to modify

  • epoch – current epoch and progress within the current epoch

  • steps_per_epoch – number of steps taken within each epoch (calculate batch number using this and epoch)

Returns

the modified loss tensor

optimizer_post_step ( module : torch.nn.modules.module.Module , optimizer : torch.optim.optimizer.Optimizer , epoch : float , steps_per_epoch : int ) [source]

Called after the optimizer step happens and weights have updated. Called independent of if the modifier is currently active or not.

Parameters
  • module – module to modify

  • optimizer – optimizer to modify

  • epoch – current epoch and progress within the current epoch

  • steps_per_epoch – number of steps taken within each epoch (calculate batch number using this and epoch)

optimizer_pre_step ( module : torch.nn.modules.module.Module , optimizer : torch.optim.optimizer.Optimizer , epoch : float , steps_per_epoch : int ) [source]

Called before the optimizer step happens (after backward has been called, before optimizer.step). Called independent of if the modifier is currently active or not.

Parameters
  • module – module to modify

  • optimizer – optimizer to modify

  • epoch – current epoch and progress within the current epoch

  • steps_per_epoch – number of steps taken within each epoch (calculate batch number using this and epoch)

state_dict ( ) Dict [ str , Dict ] [source]
Returns

PyTorch state dictionary to store any variables from this modifier

update ( module : torch.nn.modules.module.Module , optimizer : torch.optim.optimizer.Optimizer , epoch : float , steps_per_epoch : int ) [source]

Handles updating the modifier’s state, module, or optimizer. Called when update_ready() returns True.

Parameters
  • module – module to modify

  • optimizer – optimizer to modify

  • epoch – current epoch and progress within the current epoch

  • steps_per_epoch – number of steps taken within each epoch (calculate batch number using this and epoch)

class sparseml.pytorch.optim.modifier. ModifierProp ( serializable : bool = True , restrict_initialized : bool = True , restrict_enabled : bool = False , restrict_extras : Optional [ List [ str ] ] = None , no_serialize_val : Optional [ Any ] = None , func_get : Optional [ Callable ] = None , func_set : Optional [ Callable ] = None , doc : Optional [ Callable ] = None ) [source]

Bases: sparseml.optim.modifier.BaseProp

Property used to decorate a modifier. Use for creating getters and setters in a modifier. Handles making sure props cannot be changed after a certain point; ex after initialized. Also, marks the properties so they can be easily collected and serialized later.

Parameters
  • serializable – True if the property should be serialized (ex in yaml), False otherwise. Default True

  • restrict_initialized – True to keep the property from being set after initialized, False otherwise. Default True

  • restrict_enabled – True to keep the property from being set after enabled, False otherwise. Default False

  • restrict_extras – extra attributes to check, if any are truthy then keep from being set. Default None

  • no_serialize_val – If prop is equal to this value, will not serialize the prop

  • func_get – The function getter

  • func_set – The function setter

  • doc – The docs function

getter ( func_get : Callable ) sparseml.optim.modifier.BaseProp [source]

Create a ModifierProp based off the current instance with the getter function

Parameters

func_get – the getter function

Returns

the recreated instance with the new getter function

property no_serialize_val

a value that if the prop is equal to, will not serialize the prop

Type

return

property restrictions

The attributes to check for restricting when the attribute can be set

Type

return

property serializable

True if the property should be serialized (ex in yaml), False otherwise

Type

return

setter ( func_set : Callable ) sparseml.optim.modifier.BaseProp [source]

Create a ModifierProp based off the current instance with the setter function

Parameters

func_set – the setter function

Returns

the recreated instance with the new setter function

class sparseml.pytorch.optim.modifier. PyTorchModifierYAML [source]

Bases: sparseml.optim.modifier.ModifierYAML

A decorator to handle making a pytorch modifier class YAML ready. IE it can be loaded in through the yaml plugin easily.

class sparseml.pytorch.optim.modifier. ScheduledModifier ( log_types : Optional [ Union [ str , List [ str ] ] ] = None , start_epoch : float = - 1.0 , min_start : float = - 1.0 , end_epoch : float = - 1.0 , min_end : float = - 1.0 , end_comparator : Optional [ int ] = 0 , ** kwargs ) [source]

Bases: sparseml.pytorch.optim.modifier.Modifier , sparseml.optim.modifier.BaseScheduled

The base scheduled modifier implementation, all scheduled modifiers must inherit from this class. The difference for this and a Modifier is that these have start and end epochs. It defines common things needed for the lifecycle and implementation of a scheduled modifier.

Lifecycle:
- initialize
- initialize_loggers

training loop:
- update_ready
- scheduled_update
- update
- scheduled_log_update
- log_update
- loss_update
- optimizer_pre_step
- optimizer_post_step
Parameters
  • log_types – The loggers that can be used by the modifier instance

  • start_epoch – The epoch to start the modifier at

  • end_epoch – The epoch to end the modifier at

  • log_types – The loggers that can be used by the modifier instance

  • min_start – The minimum acceptable value for start_epoch, default -1

  • min_end – The minimum acceptable value for end_epoch, default 0

  • end_comparator – integer value representing how the end_epoch should be compared to start_epoch. if == None, then end_epoch can only be set to what its initial value was. if == -1, then end_epoch can be less than, equal, or greater than start_epoch. if == 0, then end_epoch can be equal to or greater than start_epoch. if == 1, then end_epoch can only be greater than start_epoch.

  • kwargs – standard key word args, used to support multi inheritance

end_pending ( epoch : float , steps_per_epoch : int ) bool [source]

Base implementation compares current epoch with the end epoch and that it has been started.

Parameters
  • epoch – current epoch and progress within the current epoch

  • steps_per_epoch – number of steps taken within each epoch (calculate batch number using this and epoch)

Returns

True if the modifier is ready to stop modifying, false otherwise

ended
log_update ( module : torch.nn.modules.module.Module , optimizer : torch.optim.optimizer.Optimizer , epoch : float , steps_per_epoch : int ) [source]

Handles logging updates for the modifier for better tracking and visualization. Should be overridden for logging but not called directly, use scheduled_log_update instead.

Parameters
  • module – module to modify

  • optimizer – optimizer to modify

  • epoch – current epoch and progress within the current epoch

  • steps_per_epoch – number of steps taken within each epoch (calculate batch number using this and epoch)

scheduled_log_update ( module : torch.nn.modules.module.Module , optimizer : torch.optim.optimizer.Optimizer , epoch : float , steps_per_epoch : int ) [source]

Handles checking if a log update should happen. IE, is the modifier currently in the range of its start and end epochs. No restrictions are placed on it by update_ready in the event that the modifier should log constantly or outside of an update being ready. General use case is checking if logs should happen by comparing cached values with updated values.

Parameters
  • module – module to modify

  • optimizer – optimizer to modify

  • epoch – current epoch and progress within the current epoch

  • steps_per_epoch – number of steps taken within each epoch (calculate batch number using this and epoch)

scheduled_update ( module : torch.nn.modules.module.Module , optimizer : torch.optim.optimizer.Optimizer , epoch : float , steps_per_epoch : int ) [source]

Called by the system and calls into update() method Tracks state and should not be overridden!!

Parameters
  • module – module to modify

  • optimizer – optimizer to modify

  • epoch – current epoch and progress within the current epoch

  • steps_per_epoch – number of steps taken within each epoch (calculate batch number using this and epoch)

start_pending ( epoch : float , steps_per_epoch : int ) bool [source]

Base implementation compares current epoch with the start epoch.

Parameters
  • epoch – current epoch and progress within the current epoch

  • steps_per_epoch – number of steps taken within each epoch (calculate batch number using this and epoch)

Returns

True if the modifier is ready to begin modifying, false otherwise

started
update ( module : torch.nn.modules.module.Module , optimizer : torch.optim.optimizer.Optimizer , epoch : float , steps_per_epoch : int ) [source]

Handles updating the modifier’s state, module, or optimizer. Called when update_ready() returns True.

Parameters
  • module – module to modify

  • optimizer – optimizer to modify

  • epoch – current epoch and progress within the current epoch

  • steps_per_epoch – number of steps taken within each epoch (calculate batch number using this and epoch)

update_ready ( epoch : float , steps_per_epoch : int ) bool [source]

Base implementation checks if start_pending() or end_pending().

Parameters
  • epoch – current epoch and progress within the current epoch

  • steps_per_epoch – number of steps taken within each epoch (calculate batch number using this and epoch)

Returns

True if the modifier is pending an update and update() should be called

class sparseml.pytorch.optim.modifier. ScheduledUpdateModifier ( log_types : Optional [ Union [ str , List [ str ] ] ] = None , start_epoch : float = - 1.0 , min_start : float = - 1.0 , end_epoch : float = - 1.0 , min_end : float = - 1.0 , end_comparator : Optional [ int ] = 0 , update_frequency : float = - 1.0 , min_frequency : float = - 1.0 , ** kwargs ) [source]

Bases: sparseml.pytorch.optim.modifier.ScheduledModifier , sparseml.optim.modifier.BaseUpdate

The base scheduled update modifier implementation, all scheduled update modifiers must inherit from this class. The difference for this and a ScheduledModifier is that these have a certain interval that they update within the start and end ranges. It defines common things needed for the lifecycle and implementation of a scheduled update modifier.

Lifecycle:
- initialize
- initialize_loggers

training loop:
- update_ready
- scheduled_update
- update
- loss_update
- optimizer_pre_step
- optimizer_post_step
Parameters
  • log_types – The loggers that can be used by the modifier instance

  • start_epoch – The epoch to start the modifier at

  • end_epoch – The epoch to end the modifier at

  • log_types – The loggers that can be used by the modifier instance

  • min_start – The minimum acceptable value for start_epoch, default -1

  • min_end – The minimum acceptable value for end_epoch, default 0

  • end_comparator – integer value representing how the end_epoch should be compared to start_epoch. if == None, then end_epoch can only be set to what its initial value was. if == -1, then end_epoch can be less than, equal, or greater than start_epoch. if == 0, then end_epoch can be equal to or greater than start_epoch. if == 1, then end_epoch can only be greater than start_epoch.

  • min_frequency – The minimum acceptable value for update_frequency, default -1

  • kwargs – standard key word args, used to support multi inheritance

update ( module : torch.nn.modules.module.Module , optimizer : torch.optim.optimizer.Optimizer , epoch : float , steps_per_epoch : int ) [source]

Handles updating the modifier’s state, module, or optimizer. Called when update_ready() returns True.

Parameters
  • module – module to modify

  • optimizer – optimizer to modify

  • epoch – current epoch and progress within the current epoch

  • steps_per_epoch – number of steps taken within each epoch (calculate batch number using this and epoch)

update_ready ( epoch : float , steps_per_epoch : int ) bool [source]

Calls base implementation to check if start_pending() or end_pending(). Additionally checks if an update is ready based on the frequency and current’ epoch vs last epoch updated.

Parameters
  • epoch – current epoch and progress within the current epoch

  • steps_per_epoch – number of steps taken within each epoch (calculate batch number using this and epoch)

Returns

True if the modifier is pending an update and update() should be called

sparseml.pytorch.optim.modifier_as module

Modifiers for increasing / enforcing activation sparsity on models while training.

class sparseml.pytorch.optim.modifier_as. ASRegModifier ( layers : Union [ str , List [ str ] ] , alpha : Union [ float , List [ float ] ] , layer_normalized : bool = False , reg_func : str = 'l1' , reg_tens : str = 'inp' , start_epoch : float = - 1.0 , end_epoch : float = - 1.0 ) [source]

Bases: sparseml.pytorch.optim.modifier.ScheduledModifier

Add a regularizer over the inputs or outputs to given layers (activation regularization). This promotes larger activation sparsity values.

Sample yaml:
!ASRegModifier
start_epoch: 0.0
end_epoch: 10.0
layers:
- layer1
-layer2
alpha: 0.00001
layer_normalized: True
reg_func: l1
reg_tens: inp
Parameters
  • layers – str or list of str for the layers to apply the AS modifier to can also use the token __ALL__ to specify all layers

  • alpha – the weight to use for the regularization, ie cost = loss + alpha * reg

  • layer_normalized – True to normalize the values by 1 / L where L is the number of layers

  • reg_func – the regularization function to apply to the activations, one of: l1, l2, relu, hs

  • reg_tens – the regularization tensor to apply a function to, one of: inp, out

  • start_epoch – The epoch to start the modifier at

  • end_epoch – The epoch to end the modifier at

alpha

the weight to use for the regularization, ie cost = loss + alpha * reg

Type

return

finalize ( module : Optional [ torch.nn.modules.module.Module ] = None , reset_loggers : bool = True , ** kwargs ) [source]

Clean up any state for tracking activation sparsity

Parameters
  • module – The model/module to finalize the modifier for. Marked optional so state can still be cleaned up on delete, but generally should always be passed in.

  • reset_loggers – True to remove any currently attached loggers (default), False to keep the loggers attached.

  • kwargs – Optional kwargs to support specific arguments for individual modifiers.

initialize ( module : torch.nn.modules.module.Module , epoch : float = 0 , loggers : Optional [ List [ sparseml.pytorch.utils.logger.BaseLogger ] ] = None , ** kwargs ) [source]

Grabs the layers to control the activation sparsity for

Parameters
  • module – the PyTorch model/module to modify

  • epoch – The epoch to initialize the modifier and module at. Defaults to 0 (start of the training process)

  • loggers – Optional list of loggers to log the modification process to

  • kwargs – Optional kwargs to support specific arguments for individual modifiers.

layer_normalized

True to normalize the values by 1 / L where L is the number of layers

Type

return

layers

str or list of str for the layers to apply the AS modifier to can also use the token __ALL__ to specify all layers

Type

return

loss_update ( loss : torch.Tensor , module : torch.nn.modules.module.Module , optimizer : torch.optim.optimizer.Optimizer , epoch : float , steps_per_epoch : int ) torch.Tensor [source]

Modify the loss to include the norms for the outputs of the layers being modified.

Parameters
  • loss – The calculated loss tensor

  • module – module to modify

  • optimizer – optimizer to modify

  • epoch – current epoch and progress within the current epoch

  • steps_per_epoch – number of steps taken within each epoch (calculate batch number using this and epoch)

Returns

the modified loss tensor

optimizer_post_step ( module : torch.nn.modules.module.Module , optimizer : torch.optim.optimizer.Optimizer , epoch : float , steps_per_epoch : int ) [source]

be sure to clear out the values after the update step has been taken

Parameters
  • module – module to modify

  • optimizer – optimizer to modify

  • epoch – current epoch and progress within the current epoch

  • steps_per_epoch – number of steps taken within each epoch (calculate batch number using this and epoch)

reg_func

the regularization function to apply to the activations, one of: l1, l2, relu, hs

Type

return

reg_tens

the regularization tensor to apply a function to, one of: inp, out

Type

return

update ( module : torch.nn.modules.module.Module , optimizer : torch.optim.optimizer.Optimizer , epoch : float , steps_per_epoch : int ) [source]

Update the loss tracking for each layer that is being modified on start and stop

Parameters
  • module – module to modify

  • optimizer – optimizer to modify

  • epoch – current epoch and progress within the current epoch

  • steps_per_epoch – number of steps taken within each epoch (calculate batch number using this and epoch)

validate ( ) [source]

Validate the values of the params for the current instance are valid

sparseml.pytorch.optim.modifier_epoch module

Modifiers related to controlling the training epochs while training a model

class sparseml.pytorch.optim.modifier_epoch. EpochRangeModifier ( start_epoch : float , end_epoch : float ) [source]

Bases: sparseml.sparsification.modifier_epoch.EpochRangeModifier , sparseml.pytorch.optim.modifier.ScheduledModifier

Simple modifier to set the range of epochs for running in a scheduled optimizer (ie to set min and max epochs within a range without hacking other modifiers).

Note, that if other modifiers exceed the range of this one for min or max epochs, this modifier will not have an effect.

Sample yaml:
!EpochRangeModifier:
start_epoch: 0
end_epoch: 90
Parameters
  • start_epoch – The epoch to start the modifier at

  • end_epoch – The epoch to end the modifier at

sparseml.pytorch.optim.modifier_lr module

Modifiers for changing the learning rate while training according to certain update formulas or patterns.

class sparseml.pytorch.optim.modifier_lr. LearningRateFunctionModifier ( lr_func : str , init_lr : float , final_lr : float , start_epoch : float , end_epoch : float , param_groups : Optional [ List [ int ] ] = None , update_frequency : float = - 1.0 , log_types : Union [ str , List [ str ] ] = '__ALL__' ) [source]

Bases: sparseml.pytorch.optim.modifier.ScheduledUpdateModifier

Modifier to set the learning rate based on supported math functions scaling between an init_lr and a final_lr. Any time an update point is reached, the LR is updated for the parameters groups in the optimizer. Specific parameter groups can be targeted for the optimizer as well.

Sample yaml:
!LearningRateFunctionModifier
start_epoch: 0.0
end_epoch: 10.0
lr_func: linear
init_lr: 0.1
final_lr: 0.001
Parameters
  • lr_func – The name of the lr function to use: [linear, cosine]

  • init_lr – The initial learning rate to use once this modifier starts

  • init_lr – The final learning rate to use once this modifier starts

  • start_epoch – The epoch to start the modifier at (set to -1.0 so it starts immediately)

  • end_epoch – The epoch to end the modifier at, (set to -1.0 so it doesn’t end)

  • update_frequency – unused and should not be set

  • log_types – The loggers to allow the learning rate to be logged to, default is __ALL__

  • constant_logging – True to constantly log on every step, False to only log on an LR change and min once per epoch, default False

Param_groups

The param group indices to set the lr for within the optimizer, if not set will set the lr for all param groups

final_lr

The final learning rate to use once this modifier starts

Type

return

init_lr

The initial learning rate to use once this modifier starts

Type

return

log_update ( module : torch.nn.modules.module.Module , optimizer : torch.optim.optimizer.Optimizer , epoch : float , steps_per_epoch : int ) [source]

Check whether to log an update for the learning rate of the modifier. Checks for a change in the LR or epoch before logging

Parameters
  • module – module to modify

  • optimizer – optimizer to modify

  • epoch – current epoch and progress within the current epoch

  • steps_per_epoch – number of steps taken within each epoch (calculate batch number using this and epoch)

lr_func

The name of the lr function to use: [linear, cosine]

Type

return

param_groups

The param group indices to set the lr for within the optimizer, if not set will set the lr for all param groups

Type

return

update ( module : torch.nn.modules.module.Module , optimizer : torch.optim.optimizer.Optimizer , epoch : float , steps_per_epoch : int ) [source]

Updates the LR based on the given epoch for the optimizer

Parameters
  • module – module to modify

  • optimizer – optimizer to modify

  • epoch – current epoch and progress within the current epoch

  • steps_per_epoch – number of steps taken within each epoch (calculate batch number using this and epoch)

validate ( ) [source]

Validate the values of the params for the current instance are valid

class sparseml.pytorch.optim.modifier_lr. LearningRateModifier ( lr_class : str , lr_kwargs : Dict , init_lr : float , start_epoch : float , end_epoch : float = - 1.0 , update_frequency : float = - 1.0 , log_types : Union [ str , List [ str ] ] = '__ALL__' , constant_logging : bool = False ) [source]

Bases: sparseml.sparsification.modifier_lr.LearningRateModifier , sparseml.pytorch.optim.modifier.ScheduledUpdateModifier

Modifier to set the learning rate to specific values at certain points in the training process between set epochs. Any time an update point is reached, the LR is updated for the parameters in the optimizer. Builds on top of the builtin LR schedulers in PyTorch.

Sample yaml:
!LearningRateModifier
start_epoch: 0.0
end_epoch: 10.0
lr_class: ExponentialLR
lr_kwargs:
gamma: 0.95
init_lr: 0.01
log_types: __ALL__
constant_logging: True
Parameters
  • lr_class – The name of the lr scheduler class to use: [StepLR, MultiStepLR, ExponentialLR, CosineAnnealingWarmRestarts]

  • lr_kwargs – The dictionary of keyword arguments to pass to the constructor for the lr_class

  • init_lr – The initial learning rate to use once this modifier starts

  • start_epoch – The epoch to start the modifier at (set to -1.0 so it starts immediately)

  • end_epoch – The epoch to end the modifier at, (set to -1.0 so it doesn’t end)

  • update_frequency – unused and should not be set

  • log_types – The loggers to allow the learning rate to be logged to, default is __ALL__

  • constant_logging – True to constantly log on every step, False to only log on an LR change and min once per epoch, default False

constant_logging

True to constantly log on every step, False to only log on an LR change, default True

Type

return

log_update ( module : torch.nn.modules.module.Module , optimizer : torch.optim.optimizer.Optimizer , epoch : float , steps_per_epoch : int ) [source]

Check whether to log an update for the learning rate of the modifier If constant logging is enabled, then will always log Otherwise checks for a change in the LR before logging

Parameters
  • module – module to modify

  • optimizer – optimizer to modify

  • epoch – current epoch and progress within the current epoch

  • steps_per_epoch – number of steps taken within each epoch (calculate batch number using this and epoch)

update ( module : torch.nn.modules.module.Module , optimizer : torch.optim.optimizer.Optimizer , epoch : float , steps_per_epoch : int ) [source]

Calls into the lr scheduler to step given the epoch Additionally will first set the lr to the init_lr if not set yet

Parameters
  • module – module to modify

  • optimizer – optimizer to modify

  • epoch – current epoch and progress within the current epoch

  • steps_per_epoch – number of steps taken within each epoch (calculate batch number using this and epoch)

validate ( ) [source]

Validate the values of the params for the current instance are valid

class sparseml.pytorch.optim.modifier_lr. SetLearningRateModifier ( learning_rate : Optional [ float ] , param_groups : Optional [ List [ int ] ] = None , start_epoch : float = - 1.0 , end_epoch : float = - 1.0 , log_types : Union [ str , List [ str ] ] = '__ALL__' , constant_logging : bool = False ) [source]

Bases: sparseml.sparsification.modifier_lr.SetLearningRateModifier , sparseml.pytorch.optim.modifier.ScheduledModifier

Modifier to set the learning rate to a specific value at a certain point in the training process. Once that point is reached, will update the optimizer’s params with the learning rate.

Sample yaml:
!SetLearningRateModifier
start_epoch: 0.0
learning_rate: 0.001
log_types: __ALL__
constant_logging: True
Parameters
  • learning_rate – The learning rate to use once this modifier starts

  • start_epoch – The epoch to start the modifier at (set to -1.0 so it starts immediately)

  • end_epoch – unused and should not be set

  • log_types – The loggers to allow the learning rate to be logged to, default is __ALL__

  • constant_logging – True to constantly log on every step, False to only log on an LR change and min once per epoch, default False

applied_learning_rate
constant_logging

True to constantly log on every step, False to only log on an LR change, default True

Type

return

log_update ( module : torch.nn.modules.module.Module , optimizer : torch.optim.optimizer.Optimizer , epoch : float , steps_per_epoch : int ) [source]

Check whether to log an update for the learning rate of the modifier If constant logging is enabled, then will always log Otherwise checks for a change in the LR before logging

Parameters
  • module – module to modify

  • optimizer – optimizer to modify

  • epoch – current epoch and progress within the current epoch

  • steps_per_epoch – number of steps taken within each epoch (calculate batch number using this and epoch)

param_groups

The param group indices to set the lr for within the optimizer, if not set will set the lr for all param groups

Type

return

update ( module : torch.nn.modules.module.Module , optimizer : torch.optim.optimizer.Optimizer , epoch : float , steps_per_epoch : int ) [source]

Check whether to update the learning rate for the optimizer or not

Parameters
  • module – module to modify

  • optimizer – optimizer to modify

  • epoch – current epoch and progress within the current epoch

  • steps_per_epoch – number of steps taken within each epoch (calculate batch number using this and epoch)

sparseml.pytorch.optim.modifier_params module

Modifier for changing the state of a modules params while training according to certain update formulas or patterns.

class sparseml.pytorch.optim.modifier_params. GradualParamModifier ( params : Union [ str , List [ str ] ] , init_val : Any , final_val : Any , start_epoch : float , end_epoch : float , update_frequency : float , inter_func : str = 'linear' , params_strict : bool = True ) [source]

Bases: sparseml.pytorch.optim.modifier.ScheduledUpdateModifier

Modifier to set the param values for a given list of parameter regex patterns from a start value through an end value and using an interpolation function for updates in between. To set all parameters in the given module, set to the ALL_TOKEN string: __ALL__

Sample YAML:
!GradualParamModifier
params: [“re:.*bias”]
init_val: [0.0, 0.0, …]
final_val: [1.0, 1.0, …]
inter_func: linear
params_strict: False
start_epoch: 0.0
end_epoch: 10.0
update_frequency: 1.0
final_val

The final value to set for the given param in the given layers at end_epoch

Type

return

init_val

The initial value to set for the given param in the given layers at start_epoch

Type

return

initialize ( module : torch.nn.modules.module.Module , epoch : float = 0 , loggers : Optional [ List [ sparseml.pytorch.utils.logger.BaseLogger ] ] = None , ** kwargs ) [source]

Grab the layers params to control the values for within the given module

Parameters
  • module – the PyTorch model/module to modify

  • epoch – The epoch to initialize the modifier and module at. Defaults to 0 (start of the training process)

  • loggers – Optional list of loggers to log the modification process to

  • kwargs – Optional kwargs to support specific arguments for individual modifiers.

inter_func

the type of interpolation function to use: [linear, cubic, inverse_cubic]; default is linear

Type

return

params

A list of full parameter names or regex patterns of names to apply pruning to. Regex patterns must be specified with the prefix ‘re:’. __ALL__ will match to all parameters.

Type

return

params_strict

True if every regex pattern in params must match at least one parameter name in the module False if missing params are ok – will not raise an err

Type

return

update ( module : torch.nn.modules.module.Module , optimizer : torch.optim.optimizer.Optimizer , epoch : float , steps_per_epoch : int ) [source]

Updates the modules layers params to the interpolated value based on given settings and current epoch.

Parameters
  • module – module to modify

  • optimizer – optimizer to modify

  • epoch – current epoch and progress within the current epoch

  • steps_per_epoch – number of steps taken within each epoch (calculate batch number using this and epoch)

validate ( ) [source]

Validate the values of the params for the current instance are valid

class sparseml.pytorch.optim.modifier_params. SetParamModifier ( params : Union [ str , List [ str ] ] , val : Any , params_strict : bool = True , start_epoch : float = 0.0 , end_epoch : float = - 1.0 ) [source]

Bases: sparseml.pytorch.optim.modifier.ScheduledModifier

Modifier to set the param values for a given list of parameter name regex patterns. To set all parameters in the given module, set to the ALL_TOKEN string: __ALL__

Sample yaml:
!SetParamModifier:
params: [“re:.*bias”]
val: [0.1, 0.1, …]
params_strict: False
start_epoch: 0
Parameters
  • params – A list of full parameter names or regex patterns of names to apply pruning to. Regex patterns must be specified with the prefix ‘re:’. __ALL__ will match to all parameters.

  • val – The value to set for the given param in the given layers at start_epoch

  • params_strict – True if every regex pattern in params must match at least one parameter name in the module, False if missing params are ok and will not raise an err

  • start_epoch – The epoch to start the modifier at (set to -1.0 so it starts immediately)

  • end_epoch – unused and should not be passed

initialize ( module : torch.nn.modules.module.Module , epoch : float = 0 , loggers : Optional [ List [ sparseml.pytorch.utils.logger.BaseLogger ] ] = None , ** kwargs ) [source]

Grab the layers params to control the values for within the given module

Parameters
  • module – the PyTorch model/module to modify

  • epoch – The epoch to initialize the modifier and module at. Defaults to 0 (start of the training process)

  • loggers – Optional list of loggers to log the modification process to

  • kwargs – Optional kwargs to support specific arguments for individual modifiers.

params

A list of full parameter names or regex patterns of names to apply pruning to. Regex patterns must be specified with the prefix ‘re:’. __ALL__ will match to all parameters.

Type

return

params_strict

True if every regex pattern in params must match at least one parameter name in the module, False if missing params are ok and will not raise an err

Type

return

update ( module : torch.nn.modules.module.Module , optimizer : torch.optim.optimizer.Optimizer , epoch : float , steps_per_epoch : int ) [source]

If start_pending(), updates the modules layers params to the value based on given settings.

Parameters
  • module – module to modify

  • optimizer – optimizer to modify

  • epoch – current epoch and progress within the current epoch

  • steps_per_epoch – number of steps taken within each epoch (calculate batch number using this and epoch)

val

The value to set for the given param in the given layers at start_epoch

Type

return

class sparseml.pytorch.optim.modifier_params. TrainableParamsModifier ( params : Union [ str , List [ str ] ] , trainable : bool , params_strict : bool = True , start_epoch : float = - 1.0 , end_epoch : float = - 1.0 ) [source]

Bases: sparseml.sparsification.modifier_params.TrainableParamsModifier , sparseml.pytorch.optim.modifier.ScheduledModifier

Modifier to control the params for a given list of parameter regex patterns. If end_epoch is supplied and greater than 0, then it will revert to the trainable settings before the modifier. To set all params in the given layers, set to the ALL_TOKEN string: __ALL__ To set all layers in the given module, set to the ALL_TOKEN string: __ALL__

Sample yaml:
!TrainableParamsModifier:
params: [“conv_net.conv1.weight”]
trainable: True
params_strict: False
start_epoch: 0
end_epoch: 10
Parameters
  • params – A list of full parameter names or regex patterns of names to apply pruning to. Regex patterns must be specified with the prefix ‘re:’. __ALL__ will match to all parameters.

  • trainable – True if the param(s) should be made trainable, False to make them non-trainable

  • params_strict – True if every regex pattern in params must match at least one parameter name in the module, False if missing params are ok and will not raise an err

  • start_epoch – The epoch to start the modifier at (set to -1.0 so it starts immediately)

  • end_epoch – The epoch to end the modifier at (set to -1.0 so it never ends), if > 0 then will revert to the original value for the params after this epoch

initialize ( module : torch.nn.modules.module.Module , epoch : float = 0 , loggers : Optional [ List [ sparseml.pytorch.utils.logger.BaseLogger ] ] = None , ** kwargs ) [source]

Grab the layers params to control trainable or not for within the given module

Parameters
  • module – the PyTorch model/module to modify

  • epoch – The epoch to initialize the modifier and module at. Defaults to 0 (start of the training process)

  • loggers – Optional list of loggers to log the modification process to

  • kwargs – Optional kwargs to support specific arguments for individual modifiers.

update ( module : torch.nn.modules.module.Module , optimizer : torch.optim.optimizer.Optimizer , epoch : float , steps_per_epoch : int ) [source]

If start_pending(), updates the modules layers params to be trainable or not depending on given settings. If end_pending(), updates the modules layers params to their original trainable state.

Parameters
  • module – module to modify

  • optimizer – optimizer to modify

  • epoch – current epoch and progress within the current epoch

  • steps_per_epoch – number of steps taken within each epoch (calculate batch number using this and epoch)

sparseml.pytorch.optim.modifier_pruning module

Modifiers for inducing / enforcing kernel sparsity (model pruning) on models while pruning.

class sparseml.pytorch.optim.modifier_pruning. ConstantPruningModifier ( params : Union [ str , List [ str ] ] , start_epoch : float = - 1.0 , end_epoch : float = - 1.0 , update_frequency : float = - 1.0 , log_types : Union [ str , List [ str ] ] = '__ALL__' ) [source]

Bases: sparseml.pytorch.optim.modifier_pruning._PruningParamsModifier , sparseml.sparsification.modifier_pruning.ConstantPruningModifier

Holds the sparsity level and shape for a given parameter(s) constant while training. Useful for transfer learning use cases.

Sample yaml:
!ConstantPruningModifier
start_epoch: 0.0
end_epoch: 10.0
params: [‘re:.*weight’]
log_types: __ALL__
Parameters
  • start_epoch – The epoch to start the modifier at

  • end_epoch – The epoch to end the modifier at

  • update_frequency – Ignored for this modifier

  • params – A list of full parameter names or regex patterns of names to apply pruning to. Regex patterns must be specified with the prefix ‘re:’. __ALL__ will match to all parameters. __ALL_PRUNABLE__ will match to all ConvNd and Linear layers’ weights

  • log_types – The loggers to allow the learning rate to be logged to, default is __ALL__

static from_sparse_model ( model : torch.nn.modules.module.Module ) List [ sparseml.pytorch.optim.modifier.ScheduledModifier ] [source]

Create constant ks modifiers for all prunable params in the given model (conv, linear) that have been artificially sparsified (sparsity > 40%). Useful for transfer learning from a pruned model.

Parameters

model – the model to create constant ks modifiers for

Returns

the list of created constant ks modifiers

class sparseml.pytorch.optim.modifier_pruning. GMPruningModifier ( init_sparsity : float , final_sparsity : Union [ float , Dict [ float , List [ str ] ] ] , start_epoch : float , end_epoch : float , update_frequency : float , params : Union [ str , List [ str ] ] , leave_enabled : bool = True , inter_func : str = 'cubic' , phased : bool = False , log_types : Union [ str , List [ str ] ] = '__ALL__' , mask_type : Union [ str , List [ int ] , sparseml.pytorch.optim.mask_creator_pruning.PruningMaskCreator ] = 'unstructured' , global_sparsity : bool = False , score_type : Union [ str , sparseml.pytorch.utils.mfac_helpers.MFACOptions ] = 'magnitude' ) [source]

Bases: sparseml.pytorch.optim.modifier_pruning._PruningParamsModifier , sparseml.sparsification.modifier_pruning.GMPruningModifier

Gradually applies kernel sparsity to a given parameter or parameters from init_sparsity until final_sparsity is reached over a given amount of time and applied with an interpolated function for each step taken.

Applies based on magnitude pruning unless otherwise specified by mask_type.

Sample yaml:
!GMPruningModifier
init_sparsity: 0.05
final_sparsity: 0.8
start_epoch: 0.0
end_epoch: 10.0
update_frequency: 1.0
params: [“re:.*weight”]
leave_enabled: True
inter_func: cubic
log_types: __ALL__
mask_type: unstructured
global_sparsity: False
score_type: magnitude
Parameters
  • init_sparsity – the initial sparsity for the param to start with at start_epoch

  • final_sparsity – the final sparsity for the param to end with at end_epoch. Can also be a Dict of final sparsity values to a list of parameters to apply them to. If given a Dict, then params must be set to [] and the params to be pruned will be read from the final_sparsity Dict

  • start_epoch – The epoch to start the modifier at

  • end_epoch – The epoch to end the modifier at

  • update_frequency – The number of epochs or fraction of epochs to update at between start and end

  • params – A list of full parameter names or regex patterns of names to apply pruning to. Regex patterns must be specified with the prefix ‘re:’. __ALL__ will match to all parameters. __ALL_PRUNABLE__ will match to all ConvNd and Linear layers’ weights. If a sparsity to param mapping is defined by final_sparsity, then params should be set to []

  • leave_enabled – True to continue masking the weights after end_epoch, False to stop masking. Should be set to False if exporting the result immediately after or doing some other prune

  • inter_func – the type of interpolation function to use: [linear, cubic, inverse_cubic]

  • phased – True to enable a phased approach where pruning will turn on and off with the update_frequency. Starts with pruning on at start_epoch, off at start_epoch + update_frequency, and so on.

  • log_types – The loggers to allow the learning rate to be logged to, default is __ALL__

  • mask_type – String to define type of sparsity (options: [‘unstructured’, ‘channel’, ‘filter’]), List to define block shape of a parameters in and out channels, or a SparsityMaskCreator object. default is ‘unstructured’

  • global_sparsity – set True to enable global pruning. if False, pruning will be layer-wise. Default is False

  • score_type – Method used to score parameters for masking, i.e. ‘magnitude’, ‘movement’. Default is ‘magnitude’

applied_sparsity
final_sparsity

the final sparsity for the param to end with at end_epoch

Type

return

global_sparsity
mask_type

the mask type used

Type

return

optimizer_post_step ( module : torch.nn.modules.module.Module , optimizer : torch.optim.optimizer.Optimizer , epoch : float , steps_per_epoch : int ) [source]

Reapply the mask after the optimizer step in case the optimizer has momentum that may have moved weights from 0. Not applied for movement pruning to allow weight reintroduction

Parameters
  • module – module to modify

  • optimizer – optimizer to modify

  • epoch – current epoch and progress within the current epoch

  • steps_per_epoch – number of steps taken within each epoch (calculate batch number using this and epoch)

optimizer_pre_step ( module : torch.nn.modules.module.Module , optimizer : torch.optim.optimizer.Optimizer , epoch : float , steps_per_epoch : int ) [source]

Update mask movement scores with gradients right before optimizer step is applied. Called here in case gradients are changed between the backwards pass and step such as in grad norm clipping :param module: module to modify :param optimizer: optimizer to modify :param epoch: current epoch and progress within the current epoch :param steps_per_epoch: number of steps taken within each epoch

(calculate batch number using this and epoch)

params

A list of full parameter names or regex patterns of names to apply pruning to. Regex patterns must be specified with the prefix ‘re:’. __ALL__ will match to all parameters. __ALL_PRUNABLE__ will match to all ConvNd and Linear layers’ weights

Type

return

phased

True to enable a phased approach where pruning will turn on and off with the update_frequency. Starts with pruning on at start_epoch, off at start_epoch + update_frequency, and so on.

Type

return

score_type
class sparseml.pytorch.optim.modifier_pruning. GlobalMagnitudePruningModifier ( init_sparsity : float , final_sparsity : float , start_epoch : float , end_epoch : float , update_frequency : float , params : Union [ str , List [ str ] ] = '__ALL_PRUNABLE__' , leave_enabled : bool = True , inter_func : str = 'cubic' , phased : bool = False , log_types : Union [ str , List [ str ] ] = '__ALL__' , mask_type : Union [ str , List [ int ] , sparseml.pytorch.optim.mask_creator_pruning.PruningMaskCreator ] = 'unstructured' , score_type : Union [ str , sparseml.pytorch.utils.mfac_helpers.MFACOptions ] = 'magnitude' ) [source]

Bases: sparseml.pytorch.optim.modifier_pruning.GMPruningModifier

Gradually applies kernel sparsity to a given parameter or parameters from init_sparsity until final_sparsity is reached over a given amount of time and applied with an interpolated function for each step taken.

Uses magnitude pruning over the global scope of all given parameters to gradually mask parameter values. Pruning is unstructured by default, structure can be specified by mask_type.

Sample yaml:
!GlobalMagnitudePruningModifier
init_sparsity: 0.05
final_sparsity: 0.8
start_epoch: 0.0
end_epoch: 10.0
update_frequency: 1.0
params: __ALL_PRUNABLE__
leave_enabled: True
inter_func: cubic
log_types: __ALL__
mask_type: unstructured
score_type: magnitude
Parameters
  • init_sparsity – the initial sparsity for the param to start with at start_epoch

  • final_sparsity – the final sparsity for the param to end with at end_epoch

  • start_epoch – The epoch to start the modifier at

  • end_epoch – The epoch to end the modifier at

  • update_frequency – The number of epochs or fraction of epochs to update at between start and end

  • params – A list of full parameter names or regex patterns of names to apply pruning to. Regex patterns must be specified with the prefix ‘re:’. __ALL__ will match to all parameters. __ALL_PRUNABLE__ will match to all ConvNd and Linear layers’ weights. Defualt is __ALL_PRUNABLE__

  • leave_enabled – True to continue masking the weights after end_epoch, False to stop masking. Should be set to False if exporting the result immediately after or doing some other prune

  • inter_func – the type of interpolation function to use: [linear, cubic, inverse_cubic]

  • phased – True to enable a phased approach where pruning will turn on and off with the update_frequency. Starts with pruning on at start_epoch, off at start_epoch + update_frequency, and so on.

  • log_types – The loggers to allow the learning rate to be logged to, default is __ALL__

  • mask_type – String to define type of sparsity (options: [‘unstructured’, ‘channel’, ‘filter’]), List to define block shape of a parameters in and out channels, or a SparsityMaskCreator object. default is ‘unstructured’

  • score_type – Method used to score parameters for masking, i.e. ‘magnitude’, ‘movement’. Default is ‘magnitude’

global_sparsity
class sparseml.pytorch.optim.modifier_pruning. LayerPruningModifier ( layers : Union [ str , List [ str ] ] , start_epoch : float = - 1.0 , end_epoch : float = - 1.0 , update_frequency : float = - 1.0 , log_types : Optional [ Union [ str , List [ str ] ] ] = None ) [source]

Bases: sparseml.pytorch.optim.modifier.ScheduledUpdateModifier

Class for pruning away layers within a module (replaces with sparseml.pytorch.nn.Identity).

Sample yaml:
!LayerPruningModifier
layers: [‘bert.encoder.layer.6’, ‘bert.encoder.layer.7’]

Parameters
  • layers – A list of full layer names to apply pruning to. __ALL_ will match to all layers. __ALL_PRUNABLE__ will match to all ConvNd and Linear layers

  • start_epoch – The epoch the modifier will prune layers away layers at

  • end_epoch – The epoch, if set and positive, the modifier will reintroduce the pruned layers at

  • update_frequency – Unused for this modifier

  • log_types – The loggers to allow the learning rate to be logged to, default is __ALL__

finalize ( module : Optional [ torch.nn.modules.module.Module ] = None , reset_loggers : bool = True , ** kwargs ) [source]

Cleans up any remaining hooks

Parameters
  • module – The model/module to finalize the modifier for. Marked optional so state can still be cleaned up on delete, but generally should always be passed in.

  • reset_loggers – True to remove any currently attached loggers (default), False to keep the loggers attached.

  • kwargs – Optional kwargs to support specific arguments for individual modifiers.

initialize ( module : torch.nn.modules.module.Module , epoch : float = 0 , loggers : Optional [ List [ sparseml.pytorch.utils.logger.BaseLogger ] ] = None , ** kwargs ) [source]

Grab the layers and apply if epoch in range to control pruning for.

Parameters
  • module – the PyTorch model/module to modify

  • epoch – The epoch to initialize the modifier and module at. Defaults to 0 (start of the training process)

  • loggers – Optional list of loggers to log the modification process to

  • kwargs – Optional kwargs to support specific arguments for individual modifiers.

layers

the layers to prune from the module

Type

return

log_update ( module : torch.nn.modules.module.Module , optimizer : torch.optim.optimizer.Optimizer , epoch : float , steps_per_epoch : int ) [source]

Check whether to log an update for the state of the modifier.

Parameters
  • module – module to modify

  • optimizer – optimizer to modify

  • epoch – current epoch and progress within the current epoch

  • steps_per_epoch – number of steps taken within each epoch (calculate batch number using this and epoch)

update ( module : torch.nn.modules.module.Module , optimizer : torch.optim.optimizer.Optimizer , epoch : float , steps_per_epoch : int ) [source]

Update to enable and disable the layers when chosen.

Parameters
  • module – module to modify

  • optimizer – optimizer to modify

  • epoch – current epoch and progress within the current epoch

  • steps_per_epoch – number of steps taken within each epoch (calculate batch number using this and epoch)

class sparseml.pytorch.optim.modifier_pruning. MFACGlobalPruningModifier ( init_sparsity : float , final_sparsity : Union [ float , Dict [ float , List [ str ] ] ] , start_epoch : float , end_epoch : float , update_frequency : float , params : Union [ str , List [ str ] ] , leave_enabled : bool = True , inter_func : str = 'cubic' , phased : bool = False , log_types : Union [ str , List [ str ] ] = '__ALL__' , mask_type : Union [ str , List [ int ] , sparseml.pytorch.optim.mask_creator_pruning.PruningMaskCreator ] = 'unstructured' , mfac_options : Optional [ Dict [ str , Any ] ] = None ) [source]

Bases: sparseml.pytorch.optim.modifier_pruning.MFACPruningModifier

Gradually applies kernel sparsity to a given parameter or parameters from init_sparsity until final_sparsity is reached over a given amount of time and applied with an interpolated function for each step taken.

Uses the Matrix-Free Approxmiate Curvature (M-FAC) algorithm for solving for optimal pruning updates by estimating the inverse Hessian matrix to the loss over time under the Optimal Brain Surgeon (OBS) framework. A link to the paper will be included here in an upcoming update.

Sample yaml:
!MFACPruningModifier
init_sparsity: 0.05
final_sparsity: 0.8
start_epoch: 0.0
end_epoch: 10.0
update_frequency: 1.0
params: [“re:.*weight”]
leave_enabled: True
inter_func: cubic
log_types: __ALL__
mask_type: unstructured
mfac_options:
num_grads: {0.0: 64, 0.5: 128, 0.75: 256, 0.85: 512}
fisher_block_size: 10000
available_gpus: [“cuda:0”]
Parameters
  • init_sparsity – the initial sparsity for the param to start with at start_epoch

  • final_sparsity – the final sparsity for the param to end with at end_epoch. Can also be a Dict of final sparsity values to a list of parameters to apply them to. If given a Dict, then params must be set to [] and the params to be pruned will be read from the final_sparsity Dict

  • start_epoch – The epoch to start the modifier at

  • end_epoch – The epoch to end the modifier at

  • update_frequency – The number of epochs or fraction of epochs to update at between start and end

  • params – A list of full parameter names or regex patterns of names to apply pruning to. Regex patterns must be specified with the prefix ‘re:’. __ALL__ will match to all parameters. __ALL_PRUNABLE__ will match to all ConvNd and Linear layers’ weights. If a sparsity to param mapping is defined by final_sparsity, then params should be set to []

  • leave_enabled – True to continue masking the weights after end_epoch, False to stop masking. Should be set to False if exporting the result immediately after or doing some other prune

  • inter_func – the type of interpolation function to use: [linear, cubic, inverse_cubic]

  • phased – True to enable a phased approach where pruning will turn on and off with the update_frequency. Starts with pruning on at start_epoch, off at start_epoch + update_frequency, and so on.

  • log_types – The loggers to allow the learning rate to be logged to, default is __ALL__

  • mask_type – String to define type of sparsity (options: [‘unstructured’, ‘channel’, ‘filter’]), List to define block shape of a parameters in and out channels, or a SparsityMaskCreator object. default is ‘unstructured’

  • mfac_options – Dictionary of key words specifying arguments for the M-FAC pruning run. num_grads controls the number of gradient samples that are kept, fisher_block_size specifies the block size to break the M-FAC computation into (default is 2000, use None for no blocks), available_gpus specifies a list of device ids that can be used for computation. For a full list of options, see the MFACOptions dataclass documentation. Default configuration uses CPU for computation without blocked computation

global_sparsity
class sparseml.pytorch.optim.modifier_pruning. MFACPruningModifier ( init_sparsity : float , final_sparsity : Union [ float , Dict [ float , List [ str ] ] ] , start_epoch : float , end_epoch : float , update_frequency : float , params : Union [ str , List [ str ] ] , leave_enabled : bool = True , inter_func : str = 'cubic' , phased : bool = False , log_types : Union [ str , List [ str ] ] = '__ALL__' , mask_type : Union [ str , List [ int ] , sparseml.pytorch.optim.mask_creator_pruning.PruningMaskCreator ] = 'unstructured' , global_sparsity : bool = False , mfac_options : Optional [ Dict [ str , Any ] ] = None ) [source]

Bases: sparseml.pytorch.optim.modifier_pruning.GMPruningModifier

Gradually applies kernel sparsity to a given parameter or parameters from init_sparsity until final_sparsity is reached over a given amount of time and applied with an interpolated function for each step taken.

Uses the Matrix-Free Approxmiate Curvature (M-FAC) algorithm for solving for optimal pruning updates by estimating the inverse Hessian matrix to the loss over time under the Optimal Brain Surgeon (OBS) framework. A link to the paper will be included here in an upcoming update.

Sample yaml:
!MFACPruningModifier
init_sparsity: 0.05
final_sparsity: 0.8
start_epoch: 0.0
end_epoch: 10.0
update_frequency: 1.0
params: [“re:.*weight”]
leave_enabled: True
inter_func: cubic
log_types: __ALL__
mask_type: unstructured
mfac_options:
num_grads: {0.0: 64, 0.5: 128, 0.75: 256, 0.85: 512}
fisher_block_size: 10000
available_gpus: [“cuda:0”]
Parameters
  • init_sparsity – the initial sparsity for the param to start with at start_epoch

  • final_sparsity – the final sparsity for the param to end with at end_epoch. Can also be a Dict of final sparsity values to a list of parameters to apply them to. If given a Dict, then params must be set to [] and the params to be pruned will be read from the final_sparsity Dict

  • start_epoch – The epoch to start the modifier at

  • end_epoch – The epoch to end the modifier at

  • update_frequency – The number of epochs or fraction of epochs to update at between start and end

  • params – A list of full parameter names or regex patterns of names to apply pruning to. Regex patterns must be specified with the prefix ‘re:’. __ALL__ will match to all parameters. __ALL_PRUNABLE__ will match to all ConvNd and Linear layers’ weights. If a sparsity to param mapping is defined by final_sparsity, then params should be set to []

  • leave_enabled – True to continue masking the weights after end_epoch, False to stop masking. Should be set to False if exporting the result immediately after or doing some other prune

  • inter_func – the type of interpolation function to use: [linear, cubic, inverse_cubic]

  • phased – True to enable a phased approach where pruning will turn on and off with the update_frequency. Starts with pruning on at start_epoch, off at start_epoch + update_frequency, and so on.

  • log_types – The loggers to allow the learning rate to be logged to, default is __ALL__

  • mask_type – String to define type of sparsity (options: [‘unstructured’, ‘channel’, ‘filter’]), List to define block shape of a parameters in and out channels, or a SparsityMaskCreator object. default is ‘unstructured’

  • global_sparsity – set True to enable global pruning. if False, pruning will be layer-wise. Default is False

  • mfac_options – Dictionary of key words specifying arguments for the M-FAC pruning run. num_grads controls the number of gradient samples that are kept, fisher_block_size specifies the block size to break the M-FAC computation into (default is 2000, use None for no blocks), available_gpus specifies a list of device ids that can be used for computation. For a full list of options, see the MFACOptions dataclass documentation. Default configuration uses CPU for computation without blocked computation

mfac_options
score_type
class sparseml.pytorch.optim.modifier_pruning. MagnitudePruningModifier ( init_sparsity : float , final_sparsity : Union [ float , Dict [ float , List [ str ] ] ] , start_epoch : float , end_epoch : float , update_frequency : float , params : Union [ str , List [ str ] ] , leave_enabled : bool = True , inter_func : str = 'cubic' , phased : bool = False , log_types : Union [ str , List [ str ] ] = '__ALL__' , mask_type : Union [ str , List [ int ] , sparseml.pytorch.optim.mask_creator_pruning.PruningMaskCreator ] = 'unstructured' ) [source]

Bases: sparseml.pytorch.optim.modifier_pruning.GMPruningModifier

Gradually applies kernel sparsity to a given parameter or parameters from init_sparsity until final_sparsity is reached over a given amount of time and applied with an interpolated function for each step taken.

Uses magnitude pruning to gradually mask parameter values. Pruning is unstructured by default, structure can be specified by mask_type.

Sample yaml:
!MagnitudePruningModifier
init_sparsity: 0.05
final_sparsity: 0.8
start_epoch: 0.0
end_epoch: 10.0
update_frequency: 1.0
params: [“re:.*weight”]
leave_enabled: True
inter_func: cubic
log_types: __ALL__
mask_type: unstructured
Parameters
  • init_sparsity – the initial sparsity for the param to start with at start_epoch

  • final_sparsity – the final sparsity for the param to end with at end_epoch. Can also be a Dict of final sparsity values to a list of parameters to apply them to. If given a Dict, then params must be set to [] and the params to be pruned will be read from the final_sparsity Dict

  • start_epoch – The epoch to start the modifier at

  • end_epoch – The epoch to end the modifier at

  • update_frequency – The number of epochs or fraction of epochs to update at between start and end

  • params – A list of full parameter names or regex patterns of names to apply pruning to. Regex patterns must be specified with the prefix ‘re:’. __ALL__ will match to all parameters. __ALL_PRUNABLE__ will match to all ConvNd and Linear layers’ weights. If a sparsity to param mapping is defined by final_sparsity, then params should be set to []

  • leave_enabled – True to continue masking the weights after end_epoch, False to stop masking. Should be set to False if exporting the result immediately after or doing some other prune

  • inter_func – the type of interpolation function to use: [linear, cubic, inverse_cubic]

  • phased – True to enable a phased approach where pruning will turn on and off with the update_frequency. Starts with pruning on at start_epoch, off at start_epoch + update_frequency, and so on.

  • log_types – The loggers to allow the learning rate to be logged to, default is __ALL__

  • mask_type – String to define type of sparsity (options: [‘unstructured’, ‘channel’, ‘filter’]), List to define block shape of a parameters in and out channels, or a SparsityMaskCreator object. default is ‘unstructured’

global_sparsity
score_type
class sparseml.pytorch.optim.modifier_pruning. MovementPruningModifier ( init_sparsity : float , final_sparsity : Union [ float , Dict [ float , List [ str ] ] ] , start_epoch : float , end_epoch : float , update_frequency : float , params : Union [ str , List [ str ] ] , leave_enabled : bool = True , inter_func : str = 'cubic' , phased : bool = False , log_types : Union [ str , List [ str ] ] = '__ALL__' , mask_type : Union [ str , List [ int ] , sparseml.pytorch.optim.mask_creator_pruning.PruningMaskCreator ] = 'unstructured' ) [source]

Bases: sparseml.pytorch.optim.modifier_pruning.GMPruningModifier

Gradually applies kernel sparsity to a given parameter or parameters from init_sparsity until final_sparsity is reached over a given amount of time and applied with an interpolated function for each step taken.

Uses movement pruning to gradually mask parameter values. Movement pruning introduced here: https://arxiv.org/abs/2005.07683 Pruning is unstructured by default, structure can be specified by mask_type.

Sample yaml:
!MovementPruningModifier
init_sparsity: 0.05
final_sparsity: 0.8
start_epoch: 0.0
end_epoch: 10.0
update_frequency: 1.0
params: [“re:.*weight”]
leave_enabled: True
inter_func: cubic
log_types: __ALL__
mask_type: unstructured
Parameters
  • init_sparsity – the initial sparsity for the param to start with at start_epoch

  • final_sparsity – the final sparsity for the param to end with at end_epoch. Can also be a Dict of final sparsity values to a list of parameters to apply them to. If given a Dict, then params must be set to [] and the params to be pruned will be read from the final_sparsity Dict

  • start_epoch – The epoch to start the modifier at

  • end_epoch – The epoch to end the modifier at

  • update_frequency – The number of epochs or fraction of epochs to update at between start and end

  • params – A list of full parameter names or regex patterns of names to apply pruning to. Regex patterns must be specified with the prefix ‘re:’. __ALL__ will match to all parameters. __ALL_PRUNABLE__ will match to all ConvNd and Linear layers’ weights. If a sparsity to param mapping is defined by final_sparsity, then params should be set to []

  • leave_enabled – True to continue masking the weights after end_epoch, False to stop masking. Should be set to False if exporting the result immediately after or doing some other prune

  • inter_func – the type of interpolation function to use: [linear, cubic, inverse_cubic]

  • phased – True to enable a phased approach where pruning will turn on and off with the update_frequency. Starts with pruning on at start_epoch, off at start_epoch + update_frequency, and so on.

  • log_types – The loggers to allow the learning rate to be logged to, default is __ALL__

  • mask_type – String to define type of sparsity (options: [‘unstructured’, ‘channel’, ‘filter’]), List to define block shape of a parameters in and out channels, or a SparsityMaskCreator object. default is ‘unstructured’

global_sparsity
score_type

sparseml.pytorch.optim.modifier_quantization module

Modifier for models through quantization aware training.

PyTorch version must support quantization (>=1.2, ONNX export support introduced in 1.7)

class sparseml.pytorch.optim.modifier_quantization. QuantizationModifier ( start_epoch : float = - 1.0 , submodules : Optional [ List [ str ] ] = None , model_fuse_fn_name : Optional [ str ] = None , disable_quantization_observer_epoch : Union [ None , float ] = None , freeze_bn_stats_epoch : Union [ None , float ] = None , end_epoch : float = - 1 , model_fuse_fn_kwargs : Optional [ Dict [ str , Any ] ] = None , quantize_embeddings : bool = True ) [source]

Bases: sparseml.pytorch.optim.modifier.ScheduledModifier

Enables quantization aware training (QAT) for a given module or its submodules After the start epoch, the specified module(s)’ forward pass will emulate quantized execution and the modifier will be enabled until training is completed.

Sample yaml:
!QuantizationModifier
start_epoch: 0.0
submodules: [‘blocks.0’, ‘blocks.2’]
model_fuse_fn_name: ‘fuse_module’
disable_quantization_observer_epoch: 2.0
freeze_bn_stats_epoch: 3.0
Parameters
  • start_epoch – The epoch to start the modifier at

  • submodules – List of submodule names to perform QAT on. Leave None to quantize entire model. Default is None

  • model_fuse_fn_name – Name of model function to fuse the model in place prior to performing QAT. Set as ‘no_fuse’ to skip module fusing. Leave None to use the default function sparseml.pytorch.utils.fuse_module_conv_bn_relus . Default is None

  • disable_quantization_observer_epoch – Epoch to disable updates to the module’s quantization observers. After this point, quantized weights and zero points will not be updated. Leave None to not disable observers during QAT. Default is None

  • freeze_bn_stats_epoch – Epoch to stop the tracking of batch norm stats. Leave None to not stop tracking batch norm stats during QAT. Default is None

  • end_epoch – Disabled, setting to anything other than -1 will raise an exception. For compatibility with YAML serialization only.

  • model_fuse_fn_kwargs – dictionary of keyword argument values to be passed to the model fusing function

  • quantize_embeddings – if True, will perform QAT on torch.nn.Embedding layers using sparseml.pytorch.utils.quantization.prepare_embeddings_qat to fake quantize embedding weights. Default is True. Models without embedding layers will be unaffected

disable_quantization_observer_epoch

Epoch to disable updates to the module’s quantization observers. After this point, quantized weights and zero points will not be updated. When None, observers never disabled during QAT

Type

return

finalize ( module : Optional [ torch.nn.modules.module.Module ] = None , reset_loggers : bool = True , ** kwargs ) [source]

Cleans up any state

Parameters
  • module – The model/module to finalize the modifier for. Marked optional so state can still be cleaned up on delete, but generally should always be passed in.

  • reset_loggers – True to remove any currently attached loggers (default), False to keep the loggers attached.

  • kwargs – Optional kwargs to support specific arguments for individual modifiers.

freeze_bn_stats_epoch

Epoch to stop the tracking of batch norm stats. When None, batch norm stats are track for all of training

Type

return

initialize ( module : torch.nn.modules.module.Module , epoch : float = 0 , loggers : Optional [ List [ sparseml.pytorch.utils.logger.BaseLogger ] ] = None , ** kwargs ) [source]

Grab the module / submodule to perform QAT on

Parameters
  • module – the PyTorch model/module to modify

  • epoch – The epoch to initialize the modifier and module at. Defaults to 0 (start of the training process)

  • loggers – Optional list of loggers to log the modification process to

  • kwargs – Optional kwargs to support specific arguments for individual modifiers.

model_fuse_fn_name

Name of model function to fuse the model in place prior to performing QAT. None to uses the default function sparseml.pytorch.utils.fuse_module_conv_bn_relus .

Type

return

quantize_embeddings

if True, will perform QAT on torch.nn.Embedding layers using sparseml.pytorch.utils.quantization.prepare_embeddings_qat to fake quantize embedding weights

Type

return

submodules

List of submodule names to perform QAT on. None quantizes the entire model

Type

return

update ( module : torch.nn.modules.module.Module , optimizer : torch.optim.optimizer.Optimizer , epoch : float , steps_per_epoch : int ) [source]

If start_pending(), fuses the model, sets the model quantization config, calls torch.quantization.prepare_qat on the model to begin QAT If end_pending(), updates the modules layers params to their original trainable state.

Parameters
  • module – module to modify

  • optimizer – optimizer to modify

  • epoch – current epoch and progress within the current epoch

  • steps_per_epoch – number of steps taken within each epoch (calculate batch number using this and epoch)

update_ready ( epoch : float , steps_per_epoch : int ) bool [source]
Parameters
  • epoch – current epoch and progress within the current epoch

  • steps_per_epoch – number of steps taken within each epoch (calculate batch number using this and epoch)

Returns

True if the modifier is pending an update and update() should be called

sparseml.pytorch.optim.modifier_regularizer module

Modifier for changing parameters for regularization

class sparseml.pytorch.optim.modifier_regularizer. SetWeightDecayModifier ( weight_decay : float , start_epoch : float = - 1.0 , param_groups : Optional [ List [ int ] ] = None , end_epoch : float = - 1.0 , log_types : Union [ str , List [ str ] ] = '__ALL__' , constant_logging : bool = False ) [source]

Bases: sparseml.pytorch.optim.modifier.ScheduledModifier

Modifies the weight decay (L2 penalty) applied to with an optimizer during training

Sample yaml:
!SetWeightDecayModifier
start_epoch: 0.0
weight_decay: 0.0
param_groups: [0]
log_types: __ALL__
Parameters
  • weight_decay – weight decay (L2 penalty) value to set for the given optimizer

  • start_epoch – The epoch to start the modifier at

  • param_groups – The indices of param groups in the optimizer to be modified. If None, all param groups will be modified. Default is None

  • end_epoch – unused and should not be set

  • log_types – The loggers to allow the learning rate to be logged to, default is __ALL__

  • constant_logging – True to constantly log on every step, False to only log on an LR change and min once per epoch, default False

constant_logging

True to constantly log on every step, False to only log on an LR change, default True

Type

return

log_update ( module : torch.nn.modules.module.Module , optimizer : torch.optim.optimizer.Optimizer , epoch : float , steps_per_epoch : int ) [source]

Check whether to log an update for the weight decay of the modifier If constant logging is enabled, then will always log Otherwise only logs after this modifier makes a change to the weight decay

Parameters
  • module – module to modify

  • optimizer – optimizer to modify

  • epoch – current epoch and progress within the current epoch

  • steps_per_epoch – number of steps taken within each epoch (calculate batch number using this and epoch)

param_groups

The indices of param groups in the optimizer to be modified. If None, all param groups will be modified.

Type

return

update ( module : torch.nn.modules.module.Module , optimizer : torch.optim.optimizer.Optimizer , epoch : float , steps_per_epoch : int ) [source]

If start_pending(), updates the optimizers weight decay according to the parameters of this modifier

Parameters
  • module – module to modify

  • optimizer – optimizer to modify

  • epoch – current epoch and progress within the current epoch

  • steps_per_epoch – number of steps taken within each epoch (calculate batch number using this and epoch)

weight_decay

weight decay (L2 penalty) value to set for the given optimizer

Type

return

sparseml.pytorch.optim.optimizer module

Optimizer wrapper for enforcing Modifiers on the training process of a Module.

class sparseml.pytorch.optim.optimizer. ScheduledOptimizer ( optimizer : torch.optim.optimizer.Optimizer , module : torch.nn.modules.module.Module , manager : sparseml.pytorch.optim.manager.ScheduledModifierManager , steps_per_epoch : int , loggers : Optional [ List [ sparseml.pytorch.utils.logger.BaseLogger ] ] = None ) [source]

Bases: torch.optim.optimizer.Optimizer

An optimizer wrapper to handle applying modifiers according to their schedule to both the passed in optimizer and the module.

Overrides the step() function so that this method can call before and after on the modifiers to apply appropriate modifications to both the optimizer and the module.

The epoch_start and epoch_end are based on how many steps have been taken along with the steps_per_epoch.

Lifecycle:
- training cycle
- zero_grad
- loss_update
- modifiers.loss_update
- step
- modifiers.update
- modifiers.optimizer_pre_step
- optimizer.step
- modifiers.optimizers_post_step
Parameters
  • module – module to modify

  • optimizer – optimizer to modify

  • manager – the manager or list of managers used to apply modifications

  • steps_per_epoch – the number of steps or batches in each epoch, not strictly required and can be set to -1. used to calculate decimals within the epoch, when not using can result in irregularities

  • loggers – loggers to log important info to within the modifiers; ex tensorboard or to the console

adjust_current_step ( epoch : int , step : int ) [source]

Adjust the current step for the manager’s schedule to the given epoch and step.

Parameters
  • epoch – the epoch to set the current global step to match

  • step – the step (batch) within the epoch to set the current global step to match

property learning_rate

convenience function to get the first learning rate for any of the param groups in the optimizer

Type

return

load_manager_state_dict ( state_dict ) [source]
loss_update ( loss : torch.Tensor ) torch.Tensor [source]

Optional call to update modifiers based on the calculated loss. Not needed unless one or more of the modifier is using the loss to make a modification or is modifying the loss itself.

Parameters

loss – the calculated loss after running a forward pass and loss_fn

Returns

the modified loss tensor

property manager

The ScheduledModifierManager for this optimizer

Type

return

manager_state_dict ( ) [source]
step ( closure = None ) [source]

Called to perform a step on the optimizer activation normal. Updates the current epoch based on the step count. Calls into modifiers before the step happens. Calls into modifiers after the step happens.

Parameters

closure – optional closure passed into the contained optimizer for the step

sparseml.pytorch.optim.sensitivity_as module

Sensitivity analysis implementations for increasing activation sparsity by using FATReLU

class sparseml.pytorch.optim.sensitivity_as. ASLayerTracker ( layer : torch.nn.modules.module.Module , track_input : bool = False , track_output : bool = False , input_func : Union [ None , Callable ] = None , output_func : Union [ None , Callable ] = None ) [source]

Bases: object

An implementation for tracking activation sparsity properties for a module.

Parameters
  • layer – the module to track activation sparsity for

  • track_input – track the input sparsity for the module

  • track_output – track the output sparsity for the module

  • input_func – the function to call on input to the layer and receives the input tensor

  • output_func – the function to call on output to the layer and receives the output tensor

clear ( ) [source]

Clear out current results for the model

disable ( ) [source]

Disable the forward hooks for the layer

enable ( ) [source]

Enable the forward hooks to the layer

property tracked_input

the current tracked input results

Type

return

property tracked_output

the current tracked output results

Type

return

class sparseml.pytorch.optim.sensitivity_as. LayerBoostResults ( name : str , threshold : float , boosted_as : torch.Tensor , boosted_loss : sparseml.pytorch.utils.module.ModuleRunResults , baseline_as : torch.Tensor , baseline_loss : sparseml.pytorch.utils.module.ModuleRunResults ) [source]

Bases: object

Results for a specific threshold set in a FATReLU layer.

Parameters
  • name – the name of the layer the results are for

  • threshold – the threshold used in the FATReLU layer

  • boosted_as – the measured activation sparsity after threshold is applied

  • boosted_loss – the measured loss after threshold is applied

  • baseline_as – the measured activation sparsity before threshold is applied

  • baseline_loss – the measured loss before threshold is applied

property baseline_as

the measured activation sparsity before threshold is applied

Type

return

property baseline_loss

the measured loss before threshold is applied

Type

return

property boosted_as

the measured activation sparsity after threshold is applied

Type

return

property boosted_loss

the measured loss after threshold is applied

Type

return

property name

the name of the layer the results are for

Type

return

property threshold

the threshold used in the FATReLU layer

Type

return

class sparseml.pytorch.optim.sensitivity_as. ModuleASOneShootBooster ( module : torch.nn.modules.module.Module , device : str , dataset : torch.utils.data.dataset.Dataset , batch_size : int , loss : sparseml.pytorch.utils.loss.LossWrapper , data_loader_kwargs : Dict ) [source]

Bases: object

Implementation class for boosting the activation sparsity in a given module using FATReLUs. Programmatically goes through and figures out the best thresholds to limit loss based on provided parameters.

Parameters
  • module – the module to boost

  • device – the device to run the analysis on; ex [cpu, cuda, cuda:1]

  • dataset – the dataset used to evaluate the boosting on

  • batch_size – the batch size to run through the module in test mode

  • loss – the loss function to use for calculations

  • data_loader_kwargs – any keyword arguments to supply to a the DataLoader constructor

run_layers ( layers : List [ str ] , max_target_metric_loss : float , metric_key : str , metric_increases : bool , precision : float = 0.001 ) Dict [ str , sparseml.pytorch.optim.sensitivity_as.LayerBoostResults ] [source]

Run the booster for the specified layers.

Parameters
  • layers – names of the layers to run boosting on

  • max_target_metric_loss – the max loss in the target metric that can happen while boosting

  • metric_key – the name of the metric to evaluate while boosting; ex: [__loss__, top1acc, top5acc]. Must exist in the LossWrapper

  • metric_increases – True if the metric increases for worse loss such as in a CrossEntropyLoss, False if the metric decreases for worse such as in accuracy

  • precision – the precision to check the results to. Larger values here will give less precise results but won’t take as long

Returns

The results for the boosting

sparseml.pytorch.optim.sensitivity_lr module

Sensitivity analysis implementations for learning rate on Modules against loss funcs.

sparseml.pytorch.optim.sensitivity_lr. default_exponential_check_lrs ( init_lr : float = 1e-06 , final_lr : float = 0.5 , lr_mult : float = 1.1 ) Tuple [ float , ] [source]

Get the default learning rates to check between init_lr and final_lr.

Parameters
  • init_lr – the initial learning rate in the returned list

  • final_lr – the final learning rate in the returned list

  • lr_mult – the multiplier increase for each step between init_lr and final_lr

Returns

the list of created lrs that increase exponentially between init_lr and final_lr according to lr_mult

sparseml.pytorch.optim.sensitivity_lr. lr_loss_sensitivity ( module : torch.nn.modules.module.Module , data : torch.utils.data.dataloader.DataLoader , loss : Union [ sparseml.pytorch.utils.loss.LossWrapper , Callable [ [ Any , Any ] , torch.Tensor ] ] , optim : torch.optim.optimizer.Optimizer , device : str , steps_per_measurement : int , check_lrs : Union [ List [ float ] , Tuple [ float , ] ] = (1e-06, 1.1e-06, 1.21e-06, 1.3310000000000003e-06, 1.4641000000000003e-06, 1.6105100000000006e-06, 1.7715610000000007e-06, 1.948717100000001e-06, 2.1435888100000012e-06, 2.3579476910000015e-06, 2.5937424601000017e-06, 2.853116706110002e-06, 3.1384283767210024e-06, 3.452271214393103e-06, 3.7974983358324136e-06, 4.177248169415655e-06, 4.594972986357221e-06, 5.0544702849929435e-06, 5.559917313492238e-06, 6.115909044841462e-06, 6.727499949325609e-06, 7.40024994425817e-06, 8.140274938683989e-06, 8.954302432552388e-06, 9.849732675807628e-06, 1.0834705943388392e-05, 1.1918176537727232e-05, 1.3109994191499957e-05, 1.4420993610649954e-05, 1.586309297171495e-05, 1.7449402268886447e-05, 1.9194342495775094e-05, 2.1113776745352607e-05, 2.322515441988787e-05, 2.554766986187666e-05, 2.8102436848064327e-05, 3.091268053287076e-05, 3.4003948586157844e-05, 3.7404343444773634e-05, 4.1144777789251e-05, 4.52592555681761e-05, 4.978518112499371e-05, 5.4763699237493086e-05, 6.02400691612424e-05, 6.626407607736664e-05, 7.289048368510331e-05, 8.017953205361364e-05, 8.819748525897502e-05, 9.701723378487253e-05, 0.00010671895716335979, 0.00011739085287969578, 0.00012912993816766537, 0.00014204293198443192, 0.00015624722518287512, 0.00017187194770116264, 0.00018905914247127894, 0.00020796505671840686, 0.00022876156239024756, 0.00025163771862927233, 0.0002768014904921996, 0.0003044816395414196, 0.00033492980349556157, 0.00036842278384511775, 0.0004052650622296296, 0.0004457915684525926, 0.0004903707252978519, 0.0005394077978276372, 0.000593348577610401, 0.0006526834353714411, 0.0007179517789085853, 0.0007897469567994438, 0.0008687216524793883, 0.0009555938177273272, 0.00105115319950006, 0.001156268519450066, 0.0012718953713950728, 0.0013990849085345801, 0.0015389933993880383, 0.0016928927393268422, 0.0018621820132595267, 0.0020484002145854797, 0.0022532402360440277, 0.0024785642596484307, 0.002726420685613274, 0.0029990627541746015, 0.003298969029592062, 0.0036288659325512686, 0.003991752525806396, 0.0043909277783870364, 0.004830020556225741, 0.005313022611848316, 0.005844324873033148, 0.006428757360336463, 0.00707163309637011, 0.007778796406007121, 0.008556676046607835, 0.009412343651268619, 0.010353578016395481, 0.01138893581803503, 0.012527829399838533, 0.013780612339822387, 0.015158673573804626, 0.01667454093118509, 0.0183419950243036, 0.020176194526733963, 0.02219381397940736, 0.0244131953773481, 0.02685451491508291, 0.029539966406591206, 0.03249396304725033, 0.03574335935197537, 0.03931769528717291, 0.043249464815890204, 0.047574411297479226, 0.052331852427227155, 0.05756503766994987, 0.06332154143694486, 0.06965369558063936, 0.0766190651387033, 0.08428097165257363, 0.092709068817831, 0.10197997569961412, 0.11217797326957554, 0.1233957705965331, 0.13573534765618642, 0.14930888242180507, 0.1642397706639856, 0.18066374773038418, 0.19873012250342262, 0.2186031347537649, 0.2404634482291414, 0.2645097930520556, 0.29096077235726114, 0.3200568495929873, 0.3520625345522861, 0.38726878800751474, 0.4259956668082662, 0.4685952334890929, 0.5154547568380022, 0.5) , loss_key : str = '__loss__' , trainer_run_funcs : Optional [ sparseml.pytorch.utils.module.ModuleRunFuncs ] = None , trainer_loggers : Optional [ List [ sparseml.pytorch.utils.logger.BaseLogger ] ] = None , show_progress : bool = True ) sparseml.optim.sensitivity.LRLossSensitivityAnalysis [source]

Implementation for handling running sensitivity analysis for learning rates on modules.

Parameters
  • module – the module to run the learning rate sensitivity analysis over, it is expected to already be on the correct device

  • data – the data to run through the module for calculating the sensitivity analysis

  • loss – the loss function to use for the sensitivity analysis

  • optim – the optimizer to run the sensitivity analysis with

  • device – the device to run the analysis on; ex: cpu, cuda. module must already be on that device, this is used to place then data on that same device.

  • steps_per_measurement – the number of batches to run through for the analysis at each LR

  • check_lrs – the learning rates to check for analysis (will sort them small to large before running)

  • loss_key – the key for the loss function to track in the returned dict

  • trainer_run_funcs – override functions for ModuleTrainer class

  • trainer_loggers – loggers to log data to while running the analysis

  • show_progress – track progress of the runs if True

Returns

a list of tuples containing the analyzed learning rate at 0 and the ModuleRunResults in 1, ModuleRunResults being a collection of all the batch results run through the module at that LR

sparseml.pytorch.optim.sensitivity_pruning module

Sensitivity analysis implementations for kernel sparsity on Modules against loss funcs.

sparseml.pytorch.optim.sensitivity_pruning. model_prunability_magnitude ( module : torch.nn.modules.module.Module ) [source]

Calculate the approximate sensitivity for an overall model. Range of the values are not scaled to anything, so must be taken in context with other known models.

Parameters

module – the model to calculate the sensitivity for

Returns

the approximated sensitivity

sparseml.pytorch.optim.sensitivity_pruning. pruning_loss_sens_magnitude ( module : torch.nn.modules.module.Module , sparsity_levels : Union [ List [ float ] , Tuple [ float , ] ] = (0.0, 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.2, 0.21, 0.22, 0.23, 0.24, 0.25, 0.26, 0.27, 0.28, 0.29, 0.3, 0.31, 0.32, 0.33, 0.34, 0.35, 0.36, 0.37, 0.38, 0.39, 0.4, 0.41, 0.42, 0.43, 0.44, 0.45, 0.46, 0.47, 0.48, 0.49, 0.5, 0.51, 0.52, 0.53, 0.54, 0.55, 0.56, 0.57, 0.58, 0.59, 0.6, 0.61, 0.62, 0.63, 0.64, 0.65, 0.66, 0.67, 0.68, 0.69, 0.7, 0.71, 0.72, 0.73, 0.74, 0.75, 0.76, 0.77, 0.78, 0.79, 0.8, 0.81, 0.82, 0.83, 0.84, 0.85, 0.86, 0.87, 0.88, 0.89, 0.9, 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99) ) sparseml.optim.sensitivity.PruningLossSensitivityAnalysis [source]

Approximated kernel sparsity (pruning) loss analysis for a given model. Returns the results for each prunable param (conv, linear) in the model.

Parameters
  • module – the model to calculate the sparse sensitivity analysis for

  • sparsity_levels – the sparsity levels to calculate the loss for for each param

Returns

the analysis results for the model

sparseml.pytorch.optim.sensitivity_pruning. pruning_loss_sens_one_shot ( module : torch.nn.modules.module.Module , data : torch.utils.data.dataloader.DataLoader , loss : Union [ sparseml.pytorch.utils.loss.LossWrapper , Callable [ [ Any , Any ] , torch.Tensor ] ] , device : str , steps_per_measurement : int , sparsity_levels : List [ int ] = (0.0, 0.2, 0.4, 0.6, 0.7, 0.8, 0.85, 0.9, 0.95, 0.99) , loss_key : str = '__loss__' , tester_run_funcs : Optional [ sparseml.pytorch.utils.module.ModuleRunFuncs ] = None , tester_loggers : Optional [ List [ sparseml.pytorch.utils.logger.BaseLogger ] ] = None , show_progress : bool = True ) sparseml.optim.sensitivity.PruningLossSensitivityAnalysis [source]

Run a one shot sensitivity analysis for kernel sparsity. It does not retrain, and instead puts the model to eval mode. Moves layer by layer to calculate the sensitivity analysis for each and resets the previously run layers. Note, by default it caches the data. This means it is not parallel for data loading and the first run can take longer. Subsequent sparsity checks for layers and levels will be much faster.

Parameters
  • module – the module to run the kernel sparsity sensitivity analysis over will extract all prunable layers out

  • data – the data to run through the module for calculating the sensitivity analysis

  • loss – the loss function to use for the sensitivity analysis

  • device – the device to run the analysis on; ex: cpu, cuda

  • steps_per_measurement – the number of samples or items to take for each measurement at each sparsity lev

  • sparsity_levels – the sparsity levels to check for each layer to calculate sensitivity

  • loss_key – the key for the loss function to track in the returned dict

  • tester_run_funcs – override functions to use in the ModuleTester that runs

  • tester_loggers – loggers to log data to while running the analysis

  • show_progress – track progress of the runs if True

Returns

the sensitivity results for every layer that is prunable

Module contents

Recalibration code for the PyTorch framework. Handles things like model pruning and increasing activation sparsity.