TunableQuantization Algorithm

Algorithm overview

This is a tunable variant of MinMaxQuantization and is usually used as part of a pipeline with auxiliary algorithms. The default recommended pipeline consists of the same algorithms as **DefaultQuantization** with MinMaxQuantization replaced with this one.

Parameters

The algorithm accepts the following parameters:

"name": "TunableQuantization",
"params": {
"stat_subset_size": 300, // Size of subset to calculate activations statistics that can be used
// for quantization parameters calculation.
/* A preset is a collection of quantization algorithm parameters that will specify to the algorithm
to improve which metric the algorithm needs to concentrate. Each quantization algorithm supports
[performance, accuracy] presets */
"preset": "performance",
"tuning_scope": ["layer"], // List of quantization parameters that will be tuned,
// available options: [bits, mode, granularity, layer, range_estimator]
"estimator_tuning_scope": ["preset", "aggregator", "type", "outlier_prob"], // List of range_estimator parameters that will be tuned,
// available options: [preset, aggregator, type, outlier_prob]
"outlier_prob_choices": [1e-3, 1e-4, 1e-5] // List of outlier_prob values to use when tuning outlier_prob parameter
}

tuning_scope determines which quantization configurations will be returned to optimizer as viable options and can be a list of any of the following values:

Quantization configuration derivation is done by creating a list of all available quantization configurations supported by target hardware and then filtering it using base configuration (either from preset or previous best result) and tuning_scope. Filtering is done by choosing from all available options only those that differ from base configuration only on values of variables specified in tuning_scope.

The selection of whether to use preset or previous best result as base configuration depends on optimizer's trials_load_method: