Quantizing with accuracy control

Introduction

This is the advanced quantization flow that allows to apply 8-bit quantization to the model with control of accuracy metric. This is achieved by keeping the most impactful operations within the model in the original precision. The flow is based on the Basic 8-bit quantization and has the following differences:

  • Besided the calibration dataset, a validation dataset is required to compute accuracy metric. They can refer to the same data in the simplest case.

  • Validation function, used to compute accuracy metric is required. It can be a function that is already available in the source framework or a custom function.

  • Since accuracy validation is run several times during the quantization process, quantization with accuracy control can take more time than the Basic 8-bit quantization flow.

  • The resulted model can provide smaller performance improvement than the Basic 8-bit quantization flow because some of the operations are kept in the original precision.

Note

Currently, this flow is available only for models in OpenVINO representation.

The steps for the quantizatation with accuracy control are described below.

Prepare datasets

This step is similar to the Basic 8-bit quantization flow. The only difference is that two datasets, calibration and validation, are required.

import nncf
import torch

calibration_loader = torch.utils.data.DataLoader(...)

def transform_fn(data_item):
    images, _ = data_item
    return images

calibration_dataset = nncf.Dataset(calibration_loader, transform_fn)
validation_dataset = nncf.Dataset(calibration_loader, transform_fn)

Prepare validation function

Validation funtion receives openvino.runtime.CompiledModel object and validation dataset and returns accuracy metric value. The following code snippet shows an example of validation function for OpenVINO model:

import numpy as np
import torch
import openvino
from sklearn.metrics import accuracy_score

def validate(model: openvino.runtime.CompiledModel,
             validation_loader: torch.utils.data.DataLoader) -> float:
    predictions = []
    references = []

    output = model.outputs[0]

    for images, target in validation_loader:
        pred = model(images)[output]
        predictions.append(np.argmax(pred, axis=1))
        references.append(target)

    predictions = np.concatenate(predictions, axis=0)
    references = np.concatenate(references, axis=0)
    return accuracy_score(predictions, references)

Run quantization with accuracy control

Now, you can run quantization with accuracy control. The following code snippet shows an example of quantization with accuracy control for OpenVINO model:

model = ... # openvino.runtime.Model object

quantized_model = nncf.quantize_with_accuracy_control(model,
                        calibration_dataset=calibration_dataset,
                        validation_dataset=validation_dataset,
                        validation_fn=validate,
                        max_drop=0.01)

max_drop defines the accuracy drop threshold. The quantization process stops when the degradation of accuracy metric on the validation dataset is less than the max_drop.

nncf.quantize_with_accuracy_control() API supports all the parameters of nncf.quantize() API. For example, you can use nncf.quantize_with_accuracy_control() to quantize a model with a custom configuration.