sparseml.onnx.optim package¶
Subpackages¶
Submodules¶
sparseml.onnx.optim.analyzer_model module¶
Code related to monitoring, analyzing, and reporting info for models in ONNX. Records things like FLOPS, input and output shapes, kernel shapes, etc.
-
class
sparseml.onnx.optim.analyzer_model.
ModelAnalyzer
(model: Optional[Union[onnx.onnx_ml_pb2.ModelProto, str]], nodes: Optional[List[sparseml.onnx.optim.analyzer_model.NodeAnalyzer]] = None)[source]¶ Bases:
object
Analyze a model to get the information for every node in the model including params, prunable, flops, etc
- Parameters
model – the path to the ONNX model file or the loaded onnx.ModelProto, can also be set to None if nodes are supplied
nodes – the analyzed nodes to create the analyzer with, generally None and model should be passed to create a new one
-
static
from_dict
(dictionary: Dict[str, Any])[source]¶ - Parameters
dictionary – the dictionary to create an analysis object from
- Returns
the ModelAnalyzer instance created from the dictionary
-
get_node
(id_: str) → Union[None, sparseml.onnx.optim.analyzer_model.NodeAnalyzer][source]¶ Get the NodeAnalyzer or the node matching the given id
- Parameters
id – the id to get a node for
- Returns
the NodeAnalyzer that matches the id, if not found None
-
static
load_json
(path: str)[source]¶ - Parameters
path – the path to load a previous analysis from
- Returns
the ModelAnalyzer instance from the json
-
property
nodes
¶ list of analyzers for each node in the model graph
- Type
return
-
class
sparseml.onnx.optim.analyzer_model.
NodeAnalyzer
(model: Optional[onnx.onnx_ml_pb2.ModelProto], node: Optional[Any], node_shape: Optional[sparseml.onnx.utils.helpers.NodeShape] = None, **kwargs)[source]¶ Bases:
object
Analyzer instance for an individual node in a model
- Parameters
model – the loaded onnx.ModelProto, can also be set to None if a node’s kwargs are supplied
node – the individual node in model, can also be set to None if a node’s kwargs are supplied
node_shape – the node’s NodeShape object
kwargs – additional kwargs to pass to the node
-
property
attributes
¶ any extra attributes for the node such as padding, stride, etc
- Type
return
-
property
bias_name
¶ name of the bias for the node if applicable
- Type
return
-
property
bias_shape
¶ the shape of the bias for the node if applicable
- Type
return
-
property
flops
¶ number of flops to run the node
- Type
return
-
property
id_
¶ id of the onnx node (first output id)
- Type
return
-
property
input_names
¶ the names of the inputs to the node
- Type
return
-
property
input_shapes
¶ shapes for the inputs to the node
- Type
return
-
property
op_type
¶ the operator type for the onnx node
- Type
return
-
property
output_names
¶ the names of the outputs to the node
- Type
return
-
property
output_shapes
¶ shapes for the outputs to the node
- Type
return
-
property
params
¶ number of params in the node
- Type
return
-
property
prunable
¶ True if the node is prunable (conv, gemm, etc), False otherwise
- Type
return
-
property
prunable_equation_sensitivity
¶ approximated sensitivity for the layer towards pruning based on the layer structure and params
- Type
return
-
property
prunable_params
¶ number of prunable params in the node
- Type
return
-
property
prunable_params_zeroed
¶ number of prunable params set to zero in the node
- Type
return
-
property
weight_name
¶ the name of the weight for the node if applicable
- Type
return
-
property
weight_shape
¶ the shape of the weight for the node if applicable
- Type
return
sparseml.onnx.optim.sensitivity_pruning module¶
Sensitivity analysis implementations for kernel sparsity on Models against loss funcs.
-
class
sparseml.onnx.optim.sensitivity_pruning.
PruningLossSensitivityAnalysis
[source]¶ Bases:
object
Analysis result for how kernel sparsity (pruning) affects the loss of a given model. Contains layer by layer results.
-
add_result
(id_: Optional[str], name: str, index: int, sparsity: float, measurement: float, baseline: bool)[source]¶ Add a result to the sensitivity analysis for a specific param
- Parameters
id – the identifier to add the result for
name – the readable name to add the result for
index – the index of the param as found in the model parameters
sparsity – the sparsity to add the result for
measurement – the loss measurement to add the result for
baseline – True if this is a baseline measurement, False otherwise
-
static
from_dict
(dictionary: Dict[str, Any])[source]¶ - Parameters
dictionary – the dictionary to create an analysis object from
- Returns
the KSLossSensitivityAnalysis instance from the json
-
get_result
(id_or_name: str) → sparseml.optim.sensitivity.PruningSensitivityResult[source]¶ get a result from the sensitivity analysis for a specific param
- Parameters
id_or_name – the id or name to get the result for
- Returns
the loss sensitivity results for the given id or name
-
static
load_json
(path: str)[source]¶ - Parameters
path – the path to load a previous analysis from
- Returns
the KSLossSensitivityAnalysis instance from the json
-
plot
(path: Optional[str], plot_integral: bool, normalize: bool = True, title: Optional[str] = None) → Union[Tuple[matplotlib.figure.Figure, matplotlib.axes._axes.Axes], Tuple[None, None]][source]¶ - Parameters
path – the path to save an img version of the chart, None to display the plot
plot_integral – True to plot the calculated loss integrals for each layer, False to plot the averages
normalize – normalize the values to a unit distribution (0 mean, 1 std)
title – the title to put on the chart
- Returns
the created figure and axes if path is None, otherwise (None, None)
-
property
results
¶ the individual results for the analysis
- Type
return
-
property
results_model
¶ the overall results for the model
- Type
return
-
-
class
sparseml.onnx.optim.sensitivity_pruning.
PruningPerfSensitivityAnalysis
(num_cores: int, batch_size: int)[source]¶ Bases:
object
Analysis result for how kernel sparsity (pruning) affects the loss of a given model. Contains layer by layer results.
- Parameters
num_cores – number of physical cpu cores the analysis was run on
batch_size – the input batch size the analysis was run for
-
add_model_result
(sparsity: float, measurement: float, baseline: bool)[source]¶ Add a result to the sensitivity analysis for the overall model
- Parameters
sparsity – the sparsity to add the result for
measurement – resulting timing in seconds for the given sparsity for the measurement
baseline – True if this is a baseline measurement, False otherwise
-
add_result
(id_: Optional[str], name: str, index: int, sparsity: float, measurement: float, baseline: bool)[source]¶ Add a result to the sensitivity analysis for a specific param
- Parameters
id – the identifier to add the result for
name – the readable name to add the result for
index – the index of the param as found in the model parameters
sparsity – the sparsity to add the result for
measurement – resulting timing in seconds for the given sparsity for the measurement
baseline – True if this is a baseline measurement, False otherwise
-
property
batch_size
¶ the input batch size the analysis was run for
- Type
return
-
static
from_dict
(dictionary: Dict[str, Any])[source]¶ - Parameters
dictionary – the dictionary to create an analysis object from
- Returns
the KSPerfSensitivityAnalysis instance from the json
-
get_result
(id_or_name: str) → sparseml.optim.sensitivity.PruningSensitivityResult[source]¶ get a result from the sensitivity analysis for a specific param
- Parameters
id_or_name – the id or name to get the result for
- Returns
the loss sensitivity results for the given id or name
-
static
load_json
(path: str)[source]¶ - Parameters
path – the path to load a previous analysis from
- Returns
the KSPerfSensitivityAnalysis instance from the json
-
property
num_cores
¶ number of physical cpu cores the analysis was run on
- Type
return
-
plot
(path: Optional[str], title: Optional[str] = None) → Union[Tuple[matplotlib.figure.Figure, matplotlib.axes._axes.Axes], Tuple[None, None]][source]¶ - Parameters
path – the path to save an img version of the chart, None to display the plot
title – the title to put on the chart
- Returns
the created figure and axes if path is None, otherwise (None, None)
-
property
results
¶ the individual results for the analysis
- Type
return
-
property
results_model
¶ the overall results for the model
- Type
return
-
class
sparseml.onnx.optim.sensitivity_pruning.
PruningSensitivityResult
(id_: str, name: str, index: int, baseline_measurement_index: int = - 1, baseline_measurement_key: Optional[str] = None, sparse_measurements: Optional[Dict[float, List[float]]] = None)[source]¶ Bases:
object
A sensitivity result for a given node/param in a model. Ex: loss sensitivity or perf sensitivity
- Parameters
id – id for the node / param
name – human readable name for the node / param
index – index order for when the node / param is used in the model
baseline_measurement_index – index for where the baseline measurement is stored in the sparse_measurements, if any
sparse_measurements – the sparse measurements to prepopulate with, if any
-
add_measurement
(sparsity: float, loss: float, baseline: bool)[source]¶ add a sparse measurement to the result
- Parameters
sparsity – the sparsity the measurement was performed at
loss – resulting loss for the given sparsity for the measurement
baseline – True if this is a baseline measurement, False otherwise
-
property
averages
¶ average values of loss for each level recorded
- Type
return
-
property
baseline_average
¶ the baseline average time to compare to for the result
- Type
return
-
property
baseline_measurement_index
¶ index for where the baseline measurement is stored in the sparse_measurements, if any
- Type
return
-
property
baseline_measurement_key
¶ key for where the baseline measurement is stored in the sparse_measurements, if any
- Type
return
-
static
from_dict
(dictionary: Dict[str, Any])[source]¶ Create a new loss sensitivity result from a dictionary of values. Expected to match the format as given in the dict() call.
- Parameters
dictionary – the dictionary to create a result out of
- Returns
the created KSLossSensitivityResult
-
property
has_baseline
¶ True if the result has a baseline measurement in the sparse_measurements, False otherwise
- Type
return
-
property
id_
¶ id for the node / param
- Type
return
-
property
index
¶ index order for when the node / param is used in the model
- Type
return
-
property
name
¶ human readable name for the node / param
- Type
return
-
property
sparse_average
¶ average loss across all levels recorded
- Type
return
-
sparse_comparison
(compare_index: int = - 1)[source]¶ Compare the baseline average to a sparse average value through the difference: sparse - baseline
If compare_index is not given then will compare with the sparsity closest to 90%. 90% is used as a reasonable achievable baseline to keep from introducing too much noise at the extremes of the tests.
If not has_baseline, then will compare against the first index.
- Parameters
compare_index – the index to compare against the baseline with, if not supplied will compare against the sparsity measurement closest to 90%
- Returns
a comparison of the sparse average with the baseline (sparse - baseline)
-
property
sparse_integral
¶ integrated loss across all levels recorded
- Type
return
-
property
sparse_measurements
¶ the sparse measurements
- Type
return
-
sparseml.onnx.optim.sensitivity_pruning.
pruning_loss_sens_approx
(input_shape: Union[None, List[int], List[List[int]]], output_shape: Union[None, List[int]], params: int, apply_shape_change_mult: bool = True) → float[source]¶ Approximate the pruning sensitivity of a Neural Network’s layer based on the params and metadata for a given layer
- Parameters
input_shape – the input shape to the layer
output_shape – the output shape from the layer
params – the number of params in the layer
apply_shape_change_mult – True to adjust the sensitivity based on a weight derived from a change in input to output shape (any change is considered to be more sensitive), False to not apply
- Returns
the approximated pruning sensitivity for the layer’s settings
-
sparseml.onnx.optim.sensitivity_pruning.
pruning_loss_sens_magnitude
(model: Union[str, onnx.onnx_ml_pb2.ModelProto], sparsity_levels: Union[List[float], Tuple[float, …]] = (0.0, 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.2, 0.21, 0.22, 0.23, 0.24, 0.25, 0.26, 0.27, 0.28, 0.29, 0.3, 0.31, 0.32, 0.33, 0.34, 0.35, 0.36, 0.37, 0.38, 0.39, 0.4, 0.41, 0.42, 0.43, 0.44, 0.45, 0.46, 0.47, 0.48, 0.49, 0.5, 0.51, 0.52, 0.53, 0.54, 0.55, 0.56, 0.57, 0.58, 0.59, 0.6, 0.61, 0.62, 0.63, 0.64, 0.65, 0.66, 0.67, 0.68, 0.69, 0.7, 0.71, 0.72, 0.73, 0.74, 0.75, 0.76, 0.77, 0.78, 0.79, 0.8, 0.81, 0.82, 0.83, 0.84, 0.85, 0.86, 0.87, 0.88, 0.89, 0.9, 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99), show_progress: bool = True) → sparseml.optim.sensitivity.PruningLossSensitivityAnalysis[source]¶ Approximated kernel sparsity (pruning) loss analysis for a given model. Returns the results for each prunable param (conv, linear) in the model.
- Parameters
model – the loaded model or a file path to the onnx model to calculate the sparse sensitivity analysis for
sparsity_levels – the sparsity levels to calculate the loss for for each param
show_progress – True to log the progress with a tqdm bar, False otherwise
- Returns
the analysis results for the model
-
sparseml.onnx.optim.sensitivity_pruning.
pruning_loss_sens_magnitude_iter
(model: Union[str, onnx.onnx_ml_pb2.ModelProto], sparsity_levels: Union[List[float], Tuple[float, …]] = (0.0, 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.2, 0.21, 0.22, 0.23, 0.24, 0.25, 0.26, 0.27, 0.28, 0.29, 0.3, 0.31, 0.32, 0.33, 0.34, 0.35, 0.36, 0.37, 0.38, 0.39, 0.4, 0.41, 0.42, 0.43, 0.44, 0.45, 0.46, 0.47, 0.48, 0.49, 0.5, 0.51, 0.52, 0.53, 0.54, 0.55, 0.56, 0.57, 0.58, 0.59, 0.6, 0.61, 0.62, 0.63, 0.64, 0.65, 0.66, 0.67, 0.68, 0.69, 0.7, 0.71, 0.72, 0.73, 0.74, 0.75, 0.76, 0.77, 0.78, 0.79, 0.8, 0.81, 0.82, 0.83, 0.84, 0.85, 0.86, 0.87, 0.88, 0.89, 0.9, 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99)) → Generator[Tuple[sparseml.optim.sensitivity.PruningLossSensitivityAnalysis, sparseml.onnx.optim.sensitivity_pruning.KSSensitivityProgress], None, None][source]¶ Approximated kernel sparsity (pruning) loss analysis for a given model. Iteratively builds a KSLossSensitivityAnalysis object and yields an updated version after each layer is run. The final result is the complete analysis object.
- Parameters
model – the loaded model or a file path to the onnx model to calculate the sparse sensitivity analysis for
sparsity_levels – the sparsity levels to calculate the loss for for each param
- Returns
the analysis results for the model with an additional layer at each iteration along with a float representing the iteration progress
-
sparseml.onnx.optim.sensitivity_pruning.
pruning_loss_sens_one_shot
(model: Union[str, onnx.onnx_ml_pb2.ModelProto], data: sparseml.onnx.utils.data.DataLoader, batch_size: int, steps_per_measurement: int, sparsity_levels: List[float] = (0.0, 0.2, 0.4, 0.6, 0.7, 0.8, 0.85, 0.9, 0.95, 0.99), show_progress: bool = True, use_deepsparse_inference: bool = False) → sparseml.optim.sensitivity.PruningLossSensitivityAnalysis[source]¶ Run a one shot sensitivity analysis for kernel sparsity. It does not retrain,. Moves layer by layer to calculate the sensitivity analysis for each and resets the previously run layers. The loss is calculated by taking the kl_divergence of pruned values from the baseline.
- Parameters
model – the loaded model or a file path to the onnx model to calculate the sparse sensitivity analysis for
data – the data to run through the model
batch_size – the batch size the data is created for
steps_per_measurement – number of steps (batches) to run through the model for each sparsity level on each node
sparsity_levels – the sparsity levels to calculate the loss for for each param
show_progress – True to log the progress with a tqdm bar, False otherwise
use_deepsparse_inference – True to use the DeepSparse inference engine to run the analysis, False to use onnxruntime
- Returns
the sensitivity results for every node that is prunable
-
sparseml.onnx.optim.sensitivity_pruning.
pruning_loss_sens_one_shot_iter
(model: Union[str, onnx.onnx_ml_pb2.ModelProto], data: sparseml.onnx.utils.data.DataLoader, batch_size: int, steps_per_measurement: int, sparsity_levels: List[float] = (0.0, 0.2, 0.4, 0.6, 0.7, 0.8, 0.85, 0.9, 0.95, 0.99), use_deepsparse_inference: bool = False) → Generator[Tuple[sparseml.optim.sensitivity.PruningLossSensitivityAnalysis, sparseml.onnx.optim.sensitivity_pruning.KSSensitivityProgress], None, None][source]¶ Run a one shot sensitivity analysis for kernel sparsity. It does not retrain. Moves layer by layer to calculate the sensitivity analysis for each and resets the previously run layers. Updates and yeilds the KSLossSensitivityAnalysis at each layer. The loss is calculated by taking the kl_divergence of pruned values from the baseline.
- Parameters
model – the loaded model or a file path to the onnx model to calculate the sparse sensitivity analysis for
data – the data to run through the model
batch_size – the batch size the data is created for
steps_per_measurement – number of steps (batches) to run through the model for each sparsity level on each node
sparsity_levels – the sparsity levels to calculate the loss for for each param
use_deepsparse_inference – True to use the DeepSparse inference engine to run the analysis, False to use onnxruntime
- Returns
the sensitivity results for every node that is prunable, yields update at each layer along with iteration progress
-
sparseml.onnx.optim.sensitivity_pruning.
pruning_perf_sens_one_shot
(model: Union[str, onnx.onnx_ml_pb2.ModelProto], data: sparseml.onnx.utils.data.DataLoader, batch_size: int, num_cores: Optional[int] = None, iterations_per_check: int = 10, warmup_iterations_per_check: int = 5, sparsity_levels: List[float] = (0.0, 0.4, 0.6, 0.7, 0.8, 0.85, 0.9, 0.95, 0.975, 0.99), show_progress: bool = True, wait_between_iters: bool = False) → sparseml.optim.sensitivity.PruningPerfSensitivityAnalysis[source]¶ Run a one shot sensitivity analysis for kernel sparsity. Runs a baseline and then sets the sparsity for each layer to a given range of values as defined in sparsity_levels to measure their performance for pruning.
- Parameters
model – the loaded model or a file path to the onnx model to calculate the sparse sensitivity analysis for
data – the data to run through the model
batch_size – the size of the batch to create the model in neural magic for
num_cores – number of physical cores to run on. Default is the maximum available
iterations_per_check – number of iterations to run for perf details
warmup_iterations_per_check – number of iterations to run before perf details
sparsity_levels – the sparsity levels to calculate the loss for for each param
show_progress – True to log the progress with a tqdm bar, False otherwise
wait_between_iters – if True, will sleep the thread 0.25s between analysis benchmark iterations to allow for other processes to run.
- Returns
the sensitivity results for every node that is prunable
-
sparseml.onnx.optim.sensitivity_pruning.
pruning_perf_sens_one_shot_iter
(model: Union[str, onnx.onnx_ml_pb2.ModelProto], data: sparseml.onnx.utils.data.DataLoader, batch_size: int, num_cores: Optional[int] = None, iterations_per_check: int = 10, warmup_iterations_per_check: int = 5, sparsity_levels: List[float] = (0.0, 0.4, 0.6, 0.7, 0.8, 0.85, 0.9, 0.95, 0.975, 0.99), optimization_level: int = 0, iters_sleep_time: float = - 1) → Generator[Tuple[sparseml.optim.sensitivity.PruningPerfSensitivityAnalysis, sparseml.onnx.optim.sensitivity_pruning.KSSensitivityProgress], None, None][source]¶ Run a one shot sensitivity analysis for kernel sparsity. Runs a baseline and then sets the sparsity for each layer to a given range of values as defined in sparsity_levels to measure their performance for pruning. Yields the current KSPerfSensitivityAnalysis after each sparsity level is run.
- Parameters
model – the loaded model or a file path to the onnx model to calculate the sparse sensitivity analysis for
data – the data to run through the model
batch_size – the size of the batch to create the model in neural magic for
num_cores – number of physical cores to run on. Default is the maximum number of cores available
iterations_per_check – number of iterations to run for perf details
warmup_iterations_per_check – number of iterations to run before perf details
sparsity_levels – the sparsity levels to calculate the loss for for each param
optimization_level – the optimization level to pass to the DeepSparse inference engine for how much to optimize the model. Valid values are either 0 for minimal optimizations or 1 for maximal.
iters_sleep_time – the time to sleep the thread between analysis benchmark iterations to allow for other processes to run.
- Returns
the sensitivity results for every node that is prunable yields update at each layer along with iteration progress
Module contents¶
Recalibration code for the ONNX framework. Handles things like model pruning.