Algorithm overview

This is a tunable variant of MinMaxQuantization and is usually used as part of a pipeline with auxiliary algorithms. The default recommended pipeline consists of the same algorithms as **DefaultQuantization** with MinMaxQuantization replaced with this one.

Parameters

The algorithm accepts the following parameters:

"name": "TunableQuantization",
"params": {
    "stat_subset_size": 300,   // Size of subset to calculate activations statistics that can be used
                               // for quantization parameters calculation.
    /* A preset is a collection of quantization algorithm parameters that will specify to the algorithm
    to improve which metric the algorithm needs to concentrate. Each quantization algorithm supports
    [performance, accuracy] presets */
    "preset": "performance",
    "tuning_scope": ["layer"], // List of quantization parameters that will be tuned,
                               // available options: [bits, mode, granularity, layer, range_estimator]
    "estimator_tuning_scope": ["preset", "aggregator", "type", "outlier_prob"], // List of range_estimator parameters that will be tuned,
                                                                                // available options: [preset, aggregator, type, outlier_prob]
    "outlier_prob_choices": [1e-3, 1e-4, 1e-5] // List of outlier_prob values to use when tuning outlier_prob parameter
}

tuning_scope determines which quantization configurations will be returned to optimizer as viable options and can be a list of any of the following values:

bits, mode, granularity, range_estimator - used for quantization configuration derivation described below,
layer - adds to the possible quantization configurations option that specific layer will not be quantized.

Quantization configuration derivation is done by creating a list of all available quantization configurations supported by target hardware and then filtering it using base configuration (either from preset or previous best result) and tuning_scope. Filtering is done by choosing from all available options only those that differ from base configuration only on values of variables specified in tuning_scope.

The selection of whether to use preset or previous best result as base configuration depends on optimizer's trials_load_method:

cold_start - preset determines base quantization configuration,
fine_tune - preset option is ignored and quantization configuration used to achieve the best result in previous run is used as base quantization configuration.