sparseml.onnx.optim package

Submodules

sparseml.onnx.optim.analyzer_model module

Code related to monitoring, analyzing, and reporting info for models in ONNX. Records things like FLOPS, input and output shapes, kernel shapes, etc.

class sparseml.onnx.optim.analyzer_model. ModelAnalyzer ( model : Optional [ Union [ onnx.onnx_ml_pb2.ModelProto , str ] ] , nodes : Optional [ List [ sparseml.onnx.optim.analyzer_model.NodeAnalyzer ] ] = None ) [source]

Bases: object

Analyze a model to get the information for every node in the model including params, prunable, flops, etc

Parameters
  • model – the path to the ONNX model file or the loaded onnx.ModelProto, can also be set to None if nodes are supplied

  • nodes – the analyzed nodes to create the analyzer with, generally None and model should be passed to create a new one

dict ( ) Dict [ str , Any ] [source]
Returns

dictionary representation of the current instance

static from_dict ( dictionary : Dict [ str , Any ] ) [source]
Parameters

dictionary – the dictionary to create an analysis object from

Returns

the ModelAnalyzer instance created from the dictionary

get_node ( id_ : str ) Union [ None , sparseml.onnx.optim.analyzer_model.NodeAnalyzer ] [source]

Get the NodeAnalyzer or the node matching the given id

Parameters

id – the id to get a node for

Returns

the NodeAnalyzer that matches the id, if not found None

static load_json ( path : str ) [source]
Parameters

path – the path to load a previous analysis from

Returns

the ModelAnalyzer instance from the json

property nodes

list of analyzers for each node in the model graph

Type

return

save_json ( path : str ) [source]
Parameters

path – the path to save the json file at representing the analyzed results

class sparseml.onnx.optim.analyzer_model. NodeAnalyzer ( model : Optional [ onnx.onnx_ml_pb2.ModelProto ] , node : Optional [ Any ] , node_shape : Optional [ sparseml.onnx.utils.helpers.NodeShape ] = None , ** kwargs ) [source]

Bases: object

Analyzer instance for an individual node in a model

Parameters
  • model – the loaded onnx.ModelProto, can also be set to None if a node’s kwargs are supplied

  • node – the individual node in model, can also be set to None if a node’s kwargs are supplied

  • node_shape – the node’s NodeShape object

  • kwargs – additional kwargs to pass to the node

property attributes

any extra attributes for the node such as padding, stride, etc

Type

return

property bias_name

name of the bias for the node if applicable

Type

return

property bias_shape

the shape of the bias for the node if applicable

Type

return

dict ( ) Dict [ str , Any ] [source]
Returns

dictionary representation of the current instance

property flops

number of flops to run the node

Type

return

property id_

id of the onnx node (first output id)

Type

return

property input_names

the names of the inputs to the node

Type

return

property input_shapes

shapes for the inputs to the node

Type

return

property op_type

the operator type for the onnx node

Type

return

property output_names

the names of the outputs to the node

Type

return

property output_shapes

shapes for the outputs to the node

Type

return

property params

number of params in the node

Type

return

property prunable

True if the node is prunable (conv, gemm, etc), False otherwise

Type

return

property prunable_equation_sensitivity

approximated sensitivity for the layer towards pruning based on the layer structure and params

Type

return

property prunable_params

number of prunable params in the node

Type

return

property prunable_params_zeroed

number of prunable params set to zero in the node

Type

return

property weight_name

the name of the weight for the node if applicable

Type

return

property weight_shape

the shape of the weight for the node if applicable

Type

return

sparseml.onnx.optim.sensitivity_pruning module

Sensitivity analysis implementations for kernel sparsity on Models against loss funcs.

class sparseml.onnx.optim.sensitivity_pruning. PruningLossSensitivityAnalysis [source]

Bases: object

Analysis result for how kernel sparsity (pruning) affects the loss of a given model. Contains layer by layer results.

add_result ( id_ : Optional [ str ] , name : str , index : int , sparsity : float , measurement : float , baseline : bool ) [source]

Add a result to the sensitivity analysis for a specific param

Parameters
  • id – the identifier to add the result for

  • name – the readable name to add the result for

  • index – the index of the param as found in the model parameters

  • sparsity – the sparsity to add the result for

  • measurement – the loss measurement to add the result for

  • baseline – True if this is a baseline measurement, False otherwise

dict ( ) Dict [ str , Any ] [source]
Returns

dictionary representation of the current instance

static from_dict ( dictionary : Dict [ str , Any ] ) [source]
Parameters

dictionary – the dictionary to create an analysis object from

Returns

the KSLossSensitivityAnalysis instance from the json

get_result ( id_or_name : str ) sparseml.optim.sensitivity.PruningSensitivityResult [source]

get a result from the sensitivity analysis for a specific param

Parameters

id_or_name – the id or name to get the result for

Returns

the loss sensitivity results for the given id or name

static load_json ( path : str ) [source]
Parameters

path – the path to load a previous analysis from

Returns

the KSLossSensitivityAnalysis instance from the json

plot ( path : Optional [ str ] , plot_integral : bool , normalize : bool = True , title : Optional [ str ] = None ) Union [ Tuple [ matplotlib.figure.Figure , matplotlib.axes._axes.Axes ] , Tuple [ None , None ] ] [source]
Parameters
  • path – the path to save an img version of the chart, None to display the plot

  • plot_integral – True to plot the calculated loss integrals for each layer, False to plot the averages

  • normalize – normalize the values to a unit distribution (0 mean, 1 std)

  • title – the title to put on the chart

Returns

the created figure and axes if path is None, otherwise (None, None)

print_res ( ) [source]

Print the recorded sensitivity values results

property results

the individual results for the analysis

Type

return

property results_model

the overall results for the model

Type

return

save_json ( path : str ) [source]
Parameters

path – the path to save the json file at representing the layer sensitivities

class sparseml.onnx.optim.sensitivity_pruning. PruningPerfSensitivityAnalysis ( num_cores : int , batch_size : int ) [source]

Bases: object

Analysis result for how kernel sparsity (pruning) affects the loss of a given model. Contains layer by layer results.

Parameters
  • num_cores – number of physical cpu cores the analysis was run on

  • batch_size – the input batch size the analysis was run for

add_model_result ( sparsity : float , measurement : float , baseline : bool ) [source]

Add a result to the sensitivity analysis for the overall model

Parameters
  • sparsity – the sparsity to add the result for

  • measurement – resulting timing in seconds for the given sparsity for the measurement

  • baseline – True if this is a baseline measurement, False otherwise

add_result ( id_ : Optional [ str ] , name : str , index : int , sparsity : float , measurement : float , baseline : bool ) [source]

Add a result to the sensitivity analysis for a specific param

Parameters
  • id – the identifier to add the result for

  • name – the readable name to add the result for

  • index – the index of the param as found in the model parameters

  • sparsity – the sparsity to add the result for

  • measurement – resulting timing in seconds for the given sparsity for the measurement

  • baseline – True if this is a baseline measurement, False otherwise

property batch_size

the input batch size the analysis was run for

Type

return

dict ( ) Dict [ str , Any ] [source]
Returns

dictionary representation of the current instance

static from_dict ( dictionary : Dict [ str , Any ] ) [source]
Parameters

dictionary – the dictionary to create an analysis object from

Returns

the KSPerfSensitivityAnalysis instance from the json

get_result ( id_or_name : str ) sparseml.optim.sensitivity.PruningSensitivityResult [source]

get a result from the sensitivity analysis for a specific param

Parameters

id_or_name – the id or name to get the result for

Returns

the loss sensitivity results for the given id or name

static load_json ( path : str ) [source]
Parameters

path – the path to load a previous analysis from

Returns

the KSPerfSensitivityAnalysis instance from the json

property num_cores

number of physical cpu cores the analysis was run on

Type

return

plot ( path : Optional [ str ] , title : Optional [ str ] = None ) Union [ Tuple [ matplotlib.figure.Figure , matplotlib.axes._axes.Axes ] , Tuple [ None , None ] ] [source]
Parameters
  • path – the path to save an img version of the chart, None to display the plot

  • title – the title to put on the chart

Returns

the created figure and axes if path is None, otherwise (None, None)

print_res ( ) [source]

Print the recorded sensitivity values results

property results

the individual results for the analysis

Type

return

property results_model

the overall results for the model

Type

return

save_json ( path : str ) [source]
Parameters

path – the path to save the json file at representing the layer sensitivities

class sparseml.onnx.optim.sensitivity_pruning. PruningSensitivityResult ( id_ : str , name : str , index : int , baseline_measurement_index : int = - 1 , baseline_measurement_key : Optional [ str ] = None , sparse_measurements : Optional [ Dict [ float , List [ float ] ] ] = None ) [source]

Bases: object

A sensitivity result for a given node/param in a model. Ex: loss sensitivity or perf sensitivity

Parameters
  • id – id for the node / param

  • name – human readable name for the node / param

  • index – index order for when the node / param is used in the model

  • baseline_measurement_index – index for where the baseline measurement is stored in the sparse_measurements, if any

  • sparse_measurements – the sparse measurements to prepopulate with, if any

add_measurement ( sparsity : float , loss : float , baseline : bool ) [source]

add a sparse measurement to the result

Parameters
  • sparsity – the sparsity the measurement was performed at

  • loss – resulting loss for the given sparsity for the measurement

  • baseline – True if this is a baseline measurement, False otherwise

property averages

average values of loss for each level recorded

Type

return

property baseline_average

the baseline average time to compare to for the result

Type

return

property baseline_measurement_index

index for where the baseline measurement is stored in the sparse_measurements, if any

Type

return

property baseline_measurement_key

key for where the baseline measurement is stored in the sparse_measurements, if any

Type

return

dict ( ) Dict [ str , Any ] [source]
Returns

dictionary representation of the current instance

static from_dict ( dictionary : Dict [ str , Any ] ) [source]

Create a new loss sensitivity result from a dictionary of values. Expected to match the format as given in the dict() call.

Parameters

dictionary – the dictionary to create a result out of

Returns

the created KSLossSensitivityResult

property has_baseline

True if the result has a baseline measurement in the sparse_measurements, False otherwise

Type

return

property id_

id for the node / param

Type

return

property index

index order for when the node / param is used in the model

Type

return

property name

human readable name for the node / param

Type

return

property sparse_average

average loss across all levels recorded

Type

return

sparse_comparison ( compare_index : int = - 1 ) [source]

Compare the baseline average to a sparse average value through the difference: sparse - baseline

If compare_index is not given then will compare with the sparsity closest to 90%. 90% is used as a reasonable achievable baseline to keep from introducing too much noise at the extremes of the tests.

If not has_baseline, then will compare against the first index.

Parameters

compare_index – the index to compare against the baseline with, if not supplied will compare against the sparsity measurement closest to 90%

Returns

a comparison of the sparse average with the baseline (sparse - baseline)

property sparse_integral

integrated loss across all levels recorded

Type

return

property sparse_measurements

the sparse measurements

Type

return

sparseml.onnx.optim.sensitivity_pruning. pruning_loss_sens_approx ( input_shape : Union [ None , List [ int ] , List [ List [ int ] ] ] , output_shape : Union [ None , List [ int ] ] , params : int , apply_shape_change_mult : bool = True ) float [source]

Approximate the pruning sensitivity of a Neural Network’s layer based on the params and metadata for a given layer

Parameters
  • input_shape – the input shape to the layer

  • output_shape – the output shape from the layer

  • params – the number of params in the layer

  • apply_shape_change_mult – True to adjust the sensitivity based on a weight derived from a change in input to output shape (any change is considered to be more sensitive), False to not apply

Returns

the approximated pruning sensitivity for the layer’s settings

sparseml.onnx.optim.sensitivity_pruning. pruning_loss_sens_magnitude ( model : Union [ str , onnx.onnx_ml_pb2.ModelProto ] , sparsity_levels : Union [ List [ float ] , Tuple [ float , ] ] = (0.0, 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.2, 0.21, 0.22, 0.23, 0.24, 0.25, 0.26, 0.27, 0.28, 0.29, 0.3, 0.31, 0.32, 0.33, 0.34, 0.35, 0.36, 0.37, 0.38, 0.39, 0.4, 0.41, 0.42, 0.43, 0.44, 0.45, 0.46, 0.47, 0.48, 0.49, 0.5, 0.51, 0.52, 0.53, 0.54, 0.55, 0.56, 0.57, 0.58, 0.59, 0.6, 0.61, 0.62, 0.63, 0.64, 0.65, 0.66, 0.67, 0.68, 0.69, 0.7, 0.71, 0.72, 0.73, 0.74, 0.75, 0.76, 0.77, 0.78, 0.79, 0.8, 0.81, 0.82, 0.83, 0.84, 0.85, 0.86, 0.87, 0.88, 0.89, 0.9, 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99) , show_progress : bool = True ) sparseml.optim.sensitivity.PruningLossSensitivityAnalysis [source]

Approximated kernel sparsity (pruning) loss analysis for a given model. Returns the results for each prunable param (conv, linear) in the model.

Parameters
  • model – the loaded model or a file path to the onnx model to calculate the sparse sensitivity analysis for

  • sparsity_levels – the sparsity levels to calculate the loss for for each param

  • show_progress – True to log the progress with a tqdm bar, False otherwise

Returns

the analysis results for the model

sparseml.onnx.optim.sensitivity_pruning. pruning_loss_sens_magnitude_iter ( model : Union [ str , onnx.onnx_ml_pb2.ModelProto ] , sparsity_levels : Union [ List [ float ] , Tuple [ float , ] ] = (0.0, 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.2, 0.21, 0.22, 0.23, 0.24, 0.25, 0.26, 0.27, 0.28, 0.29, 0.3, 0.31, 0.32, 0.33, 0.34, 0.35, 0.36, 0.37, 0.38, 0.39, 0.4, 0.41, 0.42, 0.43, 0.44, 0.45, 0.46, 0.47, 0.48, 0.49, 0.5, 0.51, 0.52, 0.53, 0.54, 0.55, 0.56, 0.57, 0.58, 0.59, 0.6, 0.61, 0.62, 0.63, 0.64, 0.65, 0.66, 0.67, 0.68, 0.69, 0.7, 0.71, 0.72, 0.73, 0.74, 0.75, 0.76, 0.77, 0.78, 0.79, 0.8, 0.81, 0.82, 0.83, 0.84, 0.85, 0.86, 0.87, 0.88, 0.89, 0.9, 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99) ) Generator [ Tuple [ sparseml.optim.sensitivity.PruningLossSensitivityAnalysis , sparseml.onnx.optim.sensitivity_pruning.KSSensitivityProgress ] , None , None ] [source]

Approximated kernel sparsity (pruning) loss analysis for a given model. Iteratively builds a KSLossSensitivityAnalysis object and yields an updated version after each layer is run. The final result is the complete analysis object.

Parameters
  • model – the loaded model or a file path to the onnx model to calculate the sparse sensitivity analysis for

  • sparsity_levels – the sparsity levels to calculate the loss for for each param

Returns

the analysis results for the model with an additional layer at each iteration along with a float representing the iteration progress

sparseml.onnx.optim.sensitivity_pruning. pruning_loss_sens_one_shot ( model : Union [ str , onnx.onnx_ml_pb2.ModelProto ] , data : sparseml.onnx.utils.data.DataLoader , batch_size : int , steps_per_measurement : int , sparsity_levels : List [ float ] = (0.0, 0.2, 0.4, 0.6, 0.7, 0.8, 0.85, 0.9, 0.95, 0.99) , show_progress : bool = True , use_deepsparse_inference : bool = False ) sparseml.optim.sensitivity.PruningLossSensitivityAnalysis [source]

Run a one shot sensitivity analysis for kernel sparsity. It does not retrain,. Moves layer by layer to calculate the sensitivity analysis for each and resets the previously run layers. The loss is calculated by taking the kl_divergence of pruned values from the baseline.

Parameters
  • model – the loaded model or a file path to the onnx model to calculate the sparse sensitivity analysis for

  • data – the data to run through the model

  • batch_size – the batch size the data is created for

  • steps_per_measurement – number of steps (batches) to run through the model for each sparsity level on each node

  • sparsity_levels – the sparsity levels to calculate the loss for for each param

  • show_progress – True to log the progress with a tqdm bar, False otherwise

  • use_deepsparse_inference – True to use the DeepSparse inference engine to run the analysis, False to use onnxruntime

Returns

the sensitivity results for every node that is prunable

sparseml.onnx.optim.sensitivity_pruning. pruning_loss_sens_one_shot_iter ( model : Union [ str , onnx.onnx_ml_pb2.ModelProto ] , data : sparseml.onnx.utils.data.DataLoader , batch_size : int , steps_per_measurement : int , sparsity_levels : List [ float ] = (0.0, 0.2, 0.4, 0.6, 0.7, 0.8, 0.85, 0.9, 0.95, 0.99) , use_deepsparse_inference : bool = False ) Generator [ Tuple [ sparseml.optim.sensitivity.PruningLossSensitivityAnalysis , sparseml.onnx.optim.sensitivity_pruning.KSSensitivityProgress ] , None , None ] [source]

Run a one shot sensitivity analysis for kernel sparsity. It does not retrain. Moves layer by layer to calculate the sensitivity analysis for each and resets the previously run layers. Updates and yeilds the KSLossSensitivityAnalysis at each layer. The loss is calculated by taking the kl_divergence of pruned values from the baseline.

Parameters
  • model – the loaded model or a file path to the onnx model to calculate the sparse sensitivity analysis for

  • data – the data to run through the model

  • batch_size – the batch size the data is created for

  • steps_per_measurement – number of steps (batches) to run through the model for each sparsity level on each node

  • sparsity_levels – the sparsity levels to calculate the loss for for each param

  • use_deepsparse_inference – True to use the DeepSparse inference engine to run the analysis, False to use onnxruntime

Returns

the sensitivity results for every node that is prunable, yields update at each layer along with iteration progress

sparseml.onnx.optim.sensitivity_pruning. pruning_perf_sens_one_shot ( model : Union [ str , onnx.onnx_ml_pb2.ModelProto ] , data : sparseml.onnx.utils.data.DataLoader , batch_size : int , num_cores : Optional [ int ] = None , iterations_per_check : int = 10 , warmup_iterations_per_check : int = 5 , sparsity_levels : List [ float ] = (0.0, 0.4, 0.6, 0.7, 0.8, 0.85, 0.9, 0.95, 0.975, 0.99) , show_progress : bool = True , wait_between_iters : bool = False ) sparseml.optim.sensitivity.PruningPerfSensitivityAnalysis [source]

Run a one shot sensitivity analysis for kernel sparsity. Runs a baseline and then sets the sparsity for each layer to a given range of values as defined in sparsity_levels to measure their performance for pruning.

Parameters
  • model – the loaded model or a file path to the onnx model to calculate the sparse sensitivity analysis for

  • data – the data to run through the model

  • batch_size – the size of the batch to create the model in neural magic for

  • num_cores – number of physical cores to run on. Default is the maximum available

  • iterations_per_check – number of iterations to run for perf details

  • warmup_iterations_per_check – number of iterations to run before perf details

  • sparsity_levels – the sparsity levels to calculate the loss for for each param

  • show_progress – True to log the progress with a tqdm bar, False otherwise

  • wait_between_iters – if True, will sleep the thread 0.25s between analysis benchmark iterations to allow for other processes to run.

Returns

the sensitivity results for every node that is prunable

sparseml.onnx.optim.sensitivity_pruning. pruning_perf_sens_one_shot_iter ( model : Union [ str , onnx.onnx_ml_pb2.ModelProto ] , data : sparseml.onnx.utils.data.DataLoader , batch_size : int , num_cores : Optional [ int ] = None , iterations_per_check : int = 10 , warmup_iterations_per_check : int = 5 , sparsity_levels : List [ float ] = (0.0, 0.4, 0.6, 0.7, 0.8, 0.85, 0.9, 0.95, 0.975, 0.99) , optimization_level : int = 0 , iters_sleep_time : float = - 1 ) Generator [ Tuple [ sparseml.optim.sensitivity.PruningPerfSensitivityAnalysis , sparseml.onnx.optim.sensitivity_pruning.KSSensitivityProgress ] , None , None ] [source]

Run a one shot sensitivity analysis for kernel sparsity. Runs a baseline and then sets the sparsity for each layer to a given range of values as defined in sparsity_levels to measure their performance for pruning. Yields the current KSPerfSensitivityAnalysis after each sparsity level is run.

Parameters
  • model – the loaded model or a file path to the onnx model to calculate the sparse sensitivity analysis for

  • data – the data to run through the model

  • batch_size – the size of the batch to create the model in neural magic for

  • num_cores – number of physical cores to run on. Default is the maximum number of cores available

  • iterations_per_check – number of iterations to run for perf details

  • warmup_iterations_per_check – number of iterations to run before perf details

  • sparsity_levels – the sparsity levels to calculate the loss for for each param

  • optimization_level – the optimization level to pass to the DeepSparse inference engine for how much to optimize the model. Valid values are either 0 for minimal optimizations or 1 for maximal.

  • iters_sleep_time – the time to sleep the thread between analysis benchmark iterations to allow for other processes to run.

Returns

the sensitivity results for every node that is prunable yields update at each layer along with iteration progress

Module contents

Recalibration code for the ONNX framework. Handles things like model pruning.