sparseml.onnx.optim.quantization package¶
Submodules¶
sparseml.onnx.optim.quantization.calibration module¶
Provides a class for performing quantization calibration on an Onnx model.
-
class
sparseml.onnx.optim.quantization.calibration.
CalibrationSession
(onnx_file: str, calibrate_op_types: Iterable[str] = ('Conv', 'MatMul', 'Gemm'), exclude_nodes: Optional[List[str]] = None, include_nodes: Optional[List[str]] = None, augmented_model_path: Optional[str] = None, static: bool = True)[source]¶ Bases:
object
Class for performing quantization calibration on an Onnx model.
- Parameters
onnx_file – File path to saved Onnx model to calibrate
calibrate_op_types – List of Onnx ops names to calibrate and quantize within the model. Currently Onnx only supports quantizing ‘Conv’ and ‘MatMul’ ops.
exclude_nodes – List of operator names that should not be quantized
include_nodes – List of operator names to force to be quantized
augmented_model_path – file path to save augmented model to for verification
static – True to use static quantization. Default is True
-
add_reduce_to_node_output
(node: onnx.onnx_ml_pb2.NodeProto, output_edge: str, op_type: str) → Tuple[onnx.onnx_ml_pb2.NodeProto, onnx.onnx_ml_pb2.ValueInfoProto][source]¶ - Parameters
node – the node to add the reduce op to
output_edge – the output of node to generate reduce op for
op_type – the reduce operation name
- Returns
a tuple of the reduce operation node and its output
-
generate_augmented_model
() → onnx.onnx_ml_pb2.ModelProto[source]¶ - return: A new Onnx model with ReduceMin and ReduceMax nodes added to all
quantizable nodes in the original model and ensures their outputs are stored as part of the graph output.
-
get_quantization_params_dict
() → Dict[str, List[Union[int, float]]][source]¶ - Returns
A dictionary of quantization parameters based on the original model and calibrated quantization thresholds from runs of the process_batch function. The format of the dictionary will be: {“param_name”: [zero_point, scale]}
-
property
model
¶ The loaded model, if optimization has run, will be the optimized version
- Type
return
-
property
model_augmented
¶ The augmented model, if optimization has run, will be the optimized version
- Type
return
sparseml.onnx.optim.quantization.quantize module¶
-
class
sparseml.onnx.optim.quantization.quantize.
ONNXQuantizer
(model, per_channel, mode, static, fuse_dynamic_quant, weight_qType, input_qType, quantization_params, nodes_to_quantize, nodes_to_exclude)[source]¶ Bases:
object
-
class
sparseml.onnx.optim.quantization.quantize.
QuantizationMode
[source]¶ Bases:
object
-
IntegerOps
= 0¶
-
QLinearOps
= 1¶
-
-
class
sparseml.onnx.optim.quantization.quantize.
QuantizedInitializer
(name, initializer, rmins, rmaxs, zero_points, scales, data=[], quantized_data=[], axis=None, qType=2)[source]¶ Bases:
object
Represents a linearly quantized weight input from ONNX operators
-
class
sparseml.onnx.optim.quantization.quantize.
QuantizedValue
(name, new_quantized_name, scale_name, zero_point_name, quantized_value_type, axis=None, qType=2)[source]¶ Bases:
object
Represents a linearly quantized value (input/output/intializer)
-
class
sparseml.onnx.optim.quantization.quantize.
QuantizedValueType
[source]¶ Bases:
object
-
Initializer
= 1¶
-
Input
= 0¶
-
-
sparseml.onnx.optim.quantization.quantize.
check_opset_version
(org_model, force_fusions)[source]¶ Check opset version of original model and set opset version and fuse_dynamic_quant accordingly. If opset version < 10, set quantized model opset version to 10. If opset version == 10, do quantization without using dynamicQuantizeLinear operator. If opset version == 11, do quantization using dynamicQuantizeLinear operator. :return: fuse_dynamic_quant boolean value.
-
sparseml.onnx.optim.quantization.quantize.
quantize
(model, per_channel=False, nbits=8, quantization_mode=0, static=False, force_fusions=False, symmetric_activation=False, symmetric_weight=False, quantization_params=None, nodes_to_quantize=None, nodes_to_exclude=None)[source]¶ Given an onnx model, create a quantized onnx model and save it into a file
- Parameters
model – ModelProto to quantize
per_channel – quantize weights per channel
nbits – number of bits to represent quantized data. Currently only supporting 8-bit types
quantization_mode –
Can be one of the QuantizationMode types. IntegerOps:
the function will use integer ops. Only ConvInteger and MatMulInteger ops are supported now.
- QLinearOps:
the function will use QLinear ops. Only QLinearConv and QLinearMatMul ops are supported now.
static –
- True: The inputs/activations are quantized using static scale and zero point values
specified through quantization_params.
- False: The inputs/activations are quantized using dynamic scale and zero point values
computed while running the model.
force_fusions – True: Fuses nodes added for dynamic quantization False: No fusion is applied for nodes which are added for dynamic quantization. Should be only used in cases where backends want to apply special fusion routines
symmetric_activation – True: activations are quantized into signed integers. False: activations are quantized into unsigned integers.
symmetric_weight – True: weights are quantized into signed integers. False: weights are quantized into unsigned integers.
quantization_params –
Dictionary to specify the zero point and scale values for inputs to conv and matmul nodes. Should be specified when static is set to True. The quantization_params should be specified in the following format:
- {
“input_name”: [zero_point, scale]
}.
zero_point should be of type np.uint8 and scale should be of type np.float32. example:
- {
‘resnet_model/Relu_1:0’: [np.uint8(0), np.float32(0.019539741799235344)], ‘resnet_model/Relu_2:0’: [np.uint8(0), np.float32(0.011359662748873234)]
}
nodes_to_quantize –
List of nodes names to quantize. When this list is not None only the nodes in this list are quantized. example: [
’Conv__224’, ‘Conv__252’
]
nodes_to_exclude – List of nodes names to exclude. The nodes in this list will be excluded from quantization when it is not None.
- Returns
ModelProto with quantization
-
sparseml.onnx.optim.quantization.quantize.
quantize_data
(data, quantize_range, qType)[source]¶ - Parameters
data – data to quantize
quantize_range – list of data to weight pack.
qType – data type to quantize to. Supported types UINT8 and INT8
- Returns
minimum, maximum, zero point, scale, and quantized weights
- To pack weights, we compute a linear transformation
when data type == uint8 mode, from [rmin, rmax] -> [0, 2^{b-1}] and
- when data type == int8, from [-m , m] -> [-(2^{b-1}-1), 2^{b-1}-1] where
m = max(abs(rmin), abs(rmax))
and add necessary intermediate nodes to trasnform quantized weight to full weight using the equation r = S(q-z), where
r: real original value q: quantized value S: scale z: zero point
sparseml.onnx.optim.quantization.quantize_model_post_training module¶
Provides a wrapper function for calibrating and quantizing an Onnx model
-
sparseml.onnx.optim.quantization.quantize_model_post_training.
quantize_model_post_training
(onnx_file: str, data_loader: sparseml.onnx.utils.data.DataLoader, output_model_path: Optional[str] = None, calibrate_op_types: Iterable[str] = ('Conv', 'MatMul', 'Gemm'), exclude_nodes: Optional[List[str]] = None, include_nodes: Optional[List[str]] = None, augmented_model_path: Optional[str] = None, static: bool = True, symmetric_weight: bool = False, force_fusions: bool = False, show_progress: bool = True, run_extra_opt: bool = True) → Union[None, onnx.onnx_ml_pb2.ModelProto][source]¶ Wrapper function for calibrating and quantizing an Onnx model
- Parameters
onnx_file – File path to saved Onnx model to calibrate and quantize
data_loader – Iterable of lists of model inputs or filepath to directory of numpy arrays. If the model has multiple inputs and an .npz file is provided, the function will try to extract each input from the .npz file by name. If the names do not match, the function will try to extract the inputs in order. Will raise an exception of the number of inputs does not match the number of arrays in the .npz file.
output_model_path – Filepath to where the quantized model should be saved to. If not provided, then the quantized Onnx model object will be returned instead.
calibrate_op_types – List of Onnx ops names to calibrate and quantize within the model. Currently Onnx only supports quantizing ‘Conv’ and ‘MatMul’ ops.
exclude_nodes – List of operator names that should not be quantized
include_nodes – List of operator names force to be quantized
augmented_model_path – file path to save augmented model to for verification
static – True to use static quantization. Default is static.
symmetric_weight – True to use symmetric weight quantization. Default is False
force_fusions – True to force fusions in quantization. Default is False
show_progress – If true, will display a tqdm progress bar during calibration. Default is True
run_extra_opt – If true, will run additional optimizations on the quantized model. Currently the only optimization is quantizing identity relu outputs in ResNet blocks
- Returns
None or quantized onnx model object if output_model_path is not provided
Module contents¶
Post training quantization tools for quantizing and calibrating onnx models.