DefaultQuantization algorithm is designed to perform a fast and in many cases accurate 8-bits quantization of NNs.
The algorithm consists of three methods that are sequentially applied to a model:
This algorithm uses a two-stage statistic collection procedure, where the model is being inferred over the calibration subset, so the wall-time of quantization basically depends on the size of the subset.
The algorithm accepts all the parameters introduced by three algorithms that it relies on. These parameters should be described in the corresponding section in the configuration file (see example below):
DefaultQuantization algorithm' parameters can be roughly divided into two groups: mandatory and optional.
"preset"
- preset which controls the quantization mode (symmetric and asymmetric). It can take two values:"performance"
(default) - stands for symmetric quantization of weights and activations. This is the most performant across all the HW."mixed"
- symmetric quantization of weights and asymmetric quantization of activations. This mode can be useful for quantization of NN which has both negative and positive input values in quantizing operations, e.g. non-ReLU based CNN. "stat_subset_size"
- size of subset to calculate activations statistics used for quantization. The whole dataset is used if no parameter specified. We recommend using not less than 300 samples.All other options can be considered as an advanced mode and require deep knowledge of the quantization process. Below is an overall description of all possible parameters:
"ignored"
- NN subgraphs which should be excluded from the optimization process"scope"
- list of particular nodes to exclude"operations"
- list of operation types to exclude (expressed in OpenVINO IR notation). This list consists of the following tuples:"type"
- type of ignored operation"attributes"
- if attributes are defined they will be considered during the ignorance. They are defined by a dictionary of "<NAME>": "<VALUE>"
pairs."weights"
- this section manually defines quantization scheme for weights and the way to estimate the quantization range for that. It worth noting that changing the quantization scheme may lead to inability to infer such mode on the existing HW."bits"
- bit-width, default is 8"mode"
- quantization mode (symmetric or asymmetric)"level_low"
- minimum level in the integer range in which we quantize to, default is 0 for unsigned range, -2^(bit-1) - for signed"level_high"
- maximum level in the integer range in which we quantize to, default is 2^bits-1 for unsigned range, 2^(bit-1)-1 - for signed"granularity"
- quantization scale granularity and can take the following two values:"pertensor"
(default) - per-tensor quantization with one scale factor and zero-point"perchannel"
- per-channel quantization with per-channel scale factor and zero-point"range_estimator"
- this section describes parameters of range estimator that is used in MinMaxQuantization method to get the quantization ranges and filter outliers based on the collected statistics. These are the parameters that user can vary to get better accuracy results:"max"
- parameters to estimate top border of quantizing floating-point range:"type"
- type of the estimator:"max"
(default) - estimates the maximum in the quantizing set of value"quantile"
- estimates the quantile in the quantizing set of value"outlier_prob"
- outlier probability used in the "quantile" estimator"min"
- parameters to estimate bottom border of quantizing floating-point range:"type"
- type of the estimator:"min"
(default) - estimates the minimum in the quantizing set of value"quantile"
- estimates the quantile in the quantizing set of value"outlier_prob"
- outlier probability used in the "quantile" estimator"activations"
- this section manually defines quantization scheme for activations and the way to estimate the quantization range for that. Again, changing the quantization scheme may lead to inability to infer such mode on the existing HW.
"bits"
- bit-width, default is 8"mode"
- quantization mode (symmetric or asymmetric)"level_low"
- minimum level in the integer range in which we quantize to, default is 0 for unsigned range, -2^(bit-1) - for signed"level_high"
- maximum level in the integer range in which we quantize to, default is 2^bits-1 for unsigned range, 2^(bit-1)-1 - for signed"granularity"
- quantization scale granularity and can take the following two values:"pertensor"
(default) - per-tensor quantization with one scale factor and zero-point"perchannel"
- per-channel quantization with per-channel scale factor and zero-point"range_estimator"
- this section describes parameters of range estimator that is used in MinMaxQuantization method to get the quantization ranges and filter outliers based on the collected statistics. These are the parameters that user can vary to get better accuracy results:"preset"
- preset that defines the same estimator both for top and bottom borders of quantizing floating-point range. Possible value is "quantile"
."max"
- parameters to estimate top border of quantizing floating-point range:"aggregator"
- type of the function used to aggregate statistics obtained with estimator over the calibration dataset to get a value of the top border:"mean"
(default) - aggregates mean value"max"
- aggregates max value"min"
- aggregates min value"median"
- aggregates median value"mean_no_outliers"
- aggregates mean value after removal of extreme quantiles"median_no_outliers"
- aggregates median value after removal of extreme quantiles"hl_estimator"
- Hodges-Lehmann filter based aggregator"type"
- type of the estimator:"max"
(default) - estimates the maximum in the quantizing set of value"quantile"
- estimates the quantile in the quantizing set of value"outlier_prob"
- outlier probability used in the "quantile" estimator"min"
- parameters to estimate bottom border of quantizing floating-point range:"type"
- type of the estimator:"max"
(default) - estimates the maximum in the quantizing set of value"quantile"
- estimates the quantile in the quantizing set of value"outlier_prob"
- outlier probability used in the "quantile" estimatorBelow is a fragment of the configuration file that shows overall structure of parameters for this algorithm.