sparseml.pytorch.utils.quantization package

Submodules

sparseml.pytorch.utils.quantization.helpers module

Helper functions for performing quantization aware training with PyTorch

class sparseml.pytorch.utils.quantization.helpers.QATWrapper(forward_fn: Callable[Any, Any], num_inputs: int = 1, kwarg_input_names: Optional[List[str]] = None, num_outputs: int = 1, input_qconfigs: Union[torch.quantization.qconfig.QConfig, str, List[torch.quantization.qconfig.QConfig]] = 'asymmetric', output_qconfigs: Union[torch.quantization.qconfig.QConfig, str, List[torch.quantization.qconfig.QConfig]] = 'asymmetric')[source]

Bases: torch.nn.modules.module.Module

Wraps inputs and outputs of a Module or function with QuantStubs for Quantization-Aware-Training (QAT)

Parameters
  • forward_fn – function to be wrapped, should generally accept and return torch Tensor(s)

  • num_inputs – number of inputs of the forward function to add a QuantStub to. Will wrap the first num_inputs ordered inputs of the function. Default is 1

  • kwarg_input_names – list of names of key word arguments to the forward pass that should be wrapped with a fake quantize operation. Defaults to empty

  • num_outputs – number of outputs of the forward function to add a QuantStub to. Will wrap the first num_inputs ordered outputs of the function. Default is 1. Will also add a DeQuantStub for FP32 conversion if torch.quantization.convert is invoked

  • input_qconfigs – QConfig to use for calibrating the input QuantStubs. Can be a single QConfig that will be copied to each QuantStub or a list of one QConfig for each input. Instead of a QConfig objects, the string ‘asymmetric’ or ‘symmetric’ may be used to use default UINT8 asymmetric and symmetric quantization respectively

  • output_qconfigs – QConfig to use for calibrating the output QuantStubs. Can be a single QConfig that will be copied to each QuantStub or a list of one QConfig for each output. Instead of a QConfig objects, the string ‘asymmetric’ or ‘symmetric’ may be used to use default UINT8 asymmetric and symmetric quantization respectively

configure_qconfig()[source]

Sets the qconfigs of the quant stubs to the pre-initialized QConfigs

forward(*args, **kwargs)Any[source]
Parameters
  • args – arguments to forward function; the first num_inputs of these args will be wrapped by a QuantStub

  • kwargs – key word arguments to pass to the wrapped forward function

Returns

outputs of the forward function with a QuantStub applied to the first num_outputs outputs

static from_module(module: torch.nn.modules.module.Module)sparseml.pytorch.utils.quantization.helpers.QATWrapper[source]
Parameters

module – torch Module to create a QATWrapper for

Returns

QATWrapper object created using the given Module as the forward function. Will attempt to find any other named parameter of the QATWrapper constructor from the attributes of the given Module

training: bool
sparseml.pytorch.utils.quantization.helpers.add_quant_dequant(module, name=None, parent_module=None)[source]

Wraps all Conv and Linear submodule with a qconfig with a QuantWrapper :param module: the module to modify :param name: name of the module to modify; default to None :param parent_module: parent module containing the module to modify; default to None :return: the modified module

sparseml.pytorch.utils.quantization.helpers.configure_module_default_qconfigs(module: torch.nn.modules.module.Module)[source]

if any submodule of the given module has a configure_qconfig function, configure_qconfig will be called on that submodule to set the qconfig(s) of that module to its default

Parameters

module – module to set qconfigs for

sparseml.pytorch.utils.quantization.helpers.configure_module_qat_wrappers(module: torch.nn.modules.module.Module)[source]

if any submodule of the given module has the attribute wrap_qat == True, then it will be replaced by a QATWrapper of it created by QATWrapper.from_module. Other named kwargs to the QATWrapper constructor must be contained in a dictionary under an attributed named qat_wrapper_kwargs

Parameters

module – module to potentially wrap the submodules of

sparseml.pytorch.utils.quantization.helpers.fuse_module_conv_bn_relus(module: torch.nn.modules.module.Module, inplace: bool = True, override_bn_subclasses_forward: Union[bool, str] = True)torch.nn.modules.module.Module[source]

Performs fusion of Conv2d, BatchNorm2d, and ReLU layers found in the given module. To be fused, these layers must appear sequentially in module.named_modules() and be in the same submodule. Fuses either Conv2d -> BatchNorm2d, Conv2d -> ReLU, or Conv2d -> BatchNorm2d -> ReLU blocks

If this function does not fuse the model in the desired way, implement an in place fusing function for the model.

Parameters
  • module – the module to fuse

  • inplace – set True to perform fusions in-place. default is True

  • override_bn_subclasses_forward – if True, modules that are subclasses of BatchNorm2d will be modified to be BatchNorm2d but with the forward pass and state variables copied from the subclass. This is so these BN modules can pass PyTorch type checking when fusing. Can set to “override-only” and only parameters will be overwritten, not the forward pass. Default is True

Returns

the fused module

sparseml.pytorch.utils.quantization.helpers.get_qat_qconfig(symmetric_activations: bool = False, symmetric_weights: bool = True)torch.quantization.qconfig.QConfig[source]
Parameters
  • symmetric_activations – if True, activations will have a symmetric UINT8 quantization range with zero point set to 128. Otherwise activations will use asymmetric quantization with any zero point. Default is False

  • symmetric_weights – if True, weights will have a symmetric INT8 quantization range with zero point set to 0. Otherwise activations will use asymmetric quantization with any zero point. Default is True

Returns

A QAT fake quantization config for symmetric weight quantization and asymmetric activation quantization. The difference between this and torch.quantization.default_qat_qconfig is that the activation observer will not have reduce_range enabled.

sparseml.pytorch.utils.quantization.helpers.prepare_embeddings_qat(module: torch.nn.modules.module.Module, qconfig: Optional[torch.quantization.qconfig.QConfig] = None)[source]

adds a fake quantize call to the weights of any Embedding modules in the given module

Parameters
  • module – module to run QAT for the embeddings of

  • qconfig – qconfig to generate the fake quantize ops from. Default uses INT8 asymmetric range

sparseml.pytorch.utils.quantization.quantize_qat_export module

Helper functions for parsing an exported pytorch model trained with quantization aware training.

class sparseml.pytorch.utils.quantization.quantize_qat_export.QuantizationParams(scale, zero_point, target)

Bases: tuple

property scale

Alias for field number 0

property target

Alias for field number 2

property zero_point

Alias for field number 1

sparseml.pytorch.utils.quantization.quantize_qat_export.get_quantization_params(model: Union[onnx.onnx_ml_pb2.ModelProto, sparseml.onnx.utils.graph_editor.ONNXGraph], node: onnx.onnx_ml_pb2.NodeProto, include_target: bool = False)sparseml.pytorch.utils.quantization.quantize_qat_export.QuantizationParams[source]
Parameters
  • model – ONNX model to read from or ONNXGraph object

  • node – A QuantizeLinear or DequantizeLinear Node

  • include_target – Set True include quantization target. If False, target value will be returned as None. Default is None

Returns

QuantizationParams object with scale and zero point, will include the quantization target if it is an initializer otherwise target will be None

sparseml.pytorch.utils.quantization.quantize_qat_export.quantize_torch_qat_export(model: Union[onnx.onnx_ml_pb2.ModelProto, str], output_file_path: Optional[str] = None, inplace: bool = True)onnx.onnx_ml_pb2.ModelProto[source]
Parameters
  • model – The model to convert, or a file path to it

  • output_file_path – File path to save the converted model to

  • inplace – If true, does conversion of model in place. Default is true

Returns

Converts a model exported from a torch QAT session from a QAT graph with fake quantize ops surrounding operations to a quantized graph with quantized operations. All quantized Convs and FC inputs and outputs be surrounded by fake quantize ops

sparseml.pytorch.utils.quantization.quantize_qat_export.skip_onnx_input_quantize(model: Union[onnx.onnx_ml_pb2.ModelProto, str], output_file_path: Optional[str] = None)[source]

If the given model has a single FP32 input that feeds into a QuantizeLinear node, then the input will be changed to uint8 and the QuantizeLinear node will be deleted. This enables quantize graphs to take quantized inputs instead of floats.

If no optimization is made, a RuntimeError will be raised.

Parameters
  • model – The model to convert, or a file path to it

  • output_file_path – File path to save the converted model to

Module contents

Tools for quantizing and exporting PyTorch models