Quantization Aware Training with NNCF, using PyTorch framework

This tutorial is also available as a Jupyter notebook that can be cloned directly from GitHub. See the installation guide for instructions to run this tutorial locally on Windows, Linux or macOS.

Github

This notebook is based on ImageNet training in PyTorch.

The goal of this notebook is to demonstrate how to use the Neural Network Compression Framework NNCF 8-bit quantization to optimize a PyTorch model for inference with OpenVINO Toolkit. The optimization process contains the following steps:

  • Transform the original FP32 model to INT8

  • Use fine-tuning to restore the accuracy

  • Export optimized and original models to ONNX and then to OpenVINO IR

  • Measure and compare the performance of models

For more advanced usage, please refer to these examples.

We selected the ResNet-18 model with the Tiny ImageNet-200 dataset. ResNet-18 is the version of ResNet models that contains the fewest layers (18). Tiny ImageNet-200 is a subset of the larger ImageNet dataset with smaller images. The dataset will be downloaded in the notebook. Using the smaller model and dataset will speed up training and download time. To see other ResNet models, visit PyTorch hub.

NOTE: This notebook requires a C++ compiler.

Imports and Settings

On Windows, add the required C++ directories to the system’s path.

Import NNCF and all auxiliary packages from your Python code. Set a name for the model, and the image width and height that will be used for the network. Also define paths where PyTorch, ONNX and OpenVINO IR versions of the models will be stored.

# On Windows, add the directory that contains cl.exe to the PATH to enable PyTorch to find the
# required C++ tools. This code assumes that Visual Studio 2019 is installed in the default
# directory. If you have a different C++ compiler, please add the correct path to os.environ["PATH"]
# directly. Note that the C++ Redistributable is not enough to run this notebook.

# Adding the path to os.environ["LIB"] is not always required - it depends on the system's configuration

import sys

if sys.platform == "win32":
    import distutils.command.build_ext
    import os
    from pathlib import Path

    VS_INSTALL_DIR = r"C:/Program Files (x86)/Microsoft Visual Studio"
    cl_paths = sorted(list(Path(VS_INSTALL_DIR).glob("**/Hostx86/x64/cl.exe")))
    if len(cl_paths) == 0:
        raise ValueError(
            "Cannot find Visual Studio. This notebook requires a C++ compiler. If you installed "
            "a C++ compiler, please add the directory that contains cl.exe to `os.environ['PATH']`."
        )
    else:
        # If multiple versions of MSVC are installed, get the most recent version
        cl_path = cl_paths[-1]
        vs_dir = str(cl_path.parent)
        os.environ["PATH"] += f"{os.pathsep}{vs_dir}"
        # Code for finding the library dirs from
        # https://stackoverflow.com/questions/47423246/get-pythons-lib-path
        d = distutils.core.Distribution()
        b = distutils.command.build_ext.build_ext(d)
        b.finalize_options()
        os.environ["LIB"] = os.pathsep.join(b.library_dirs)
        print(f"Added {vs_dir} to PATH")
import sys
import time
import warnings  # to disable warnings on export to ONNX
import zipfile
from pathlib import Path
from urllib.request import urlretrieve

import torch
import nncf  # Important - should be imported directly after torch

import torch.nn as nn
import torch.nn.parallel
import torch.optim
import torch.utils.data
import torch.utils.data.distributed
import torchvision.datasets as datasets
import torchvision.models as models
import torchvision.transforms as transforms
from nncf import NNCFConfig
from nncf.torch import create_compressed_model, register_default_init_args
from openvino.inference_engine import IECore
from torch.jit import TracerWarning

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using {device} device")

MODEL_DIR = Path("model")
OUTPUT_DIR = Path("output")
DATA_DIR = Path("data")
BASE_MODEL_NAME = "resnet18"
image_size = 64

OUTPUT_DIR.mkdir(exist_ok=True)
MODEL_DIR.mkdir(exist_ok=True)
DATA_DIR.mkdir(exist_ok=True)

# Paths where PyTorch, ONNX and OpenVINO IR models will be stored
fp32_pth_path = Path(MODEL_DIR / (BASE_MODEL_NAME + "_fp32")).with_suffix(".pth")
fp32_onnx_path = Path(OUTPUT_DIR / (BASE_MODEL_NAME + "_fp32")).with_suffix(".onnx")
fp32_ir_path = fp32_onnx_path.with_suffix(".xml")
int8_onnx_path = Path(OUTPUT_DIR / (BASE_MODEL_NAME + "_int8")).with_suffix(".onnx")
int8_ir_path = int8_onnx_path.with_suffix(".xml")

# It's possible to train FP32 model from scratch, but it might be slow. So the pre-trained weights are downloaded by default.
pretrained_on_tiny_imagenet = True
fp32_pth_url = "https://storage.openvinotoolkit.org/repositories/nncf/openvino_notebook_ckpts/302_resnet18_fp32.pth"
if pretrained_on_tiny_imagenet:
    urlretrieve(fp32_pth_url, fp32_pth_path)
/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/nncf/torch/__init__.py:23: UserWarning: NNCF provides best results with torch==1.9.1, while current torch version is 1.7.1+cpu - consider switching to torch==1.9.1
  warnings.warn("NNCF provides best results with torch=={bkc}, "
WARNING:nncf:Skip applying a patch to building extension with a reason: PyTorch version is not supported for this
Using cpu device

Download Tiny ImageNet dataset * 100k images of shape 3x64x64 * 200 different classes: snake, spider, cat, truck, grasshopper, gull, etc.

def download_tiny_imagenet_200(
    data_dir,
    url="http://cs231n.stanford.edu/tiny-imagenet-200.zip",
    tarname="tiny-imagenet-200.zip",
):
    data_dir.mkdir(exist_ok=True)
    archive_path = data_dir / tarname
    urlretrieve(url, archive_path)
    zip_ref = zipfile.ZipFile(archive_path, "r")
    zip_ref.extractall(path=data_dir)
    zip_ref.close()
    print(f"Successfully downloaded and extracted dataset to: {data_dir}")


DATASET_DIR = DATA_DIR / "tiny-imagenet-200"
if not DATASET_DIR.exists():
    download_tiny_imagenet_200(DATA_DIR)
Successfully downloaded and extracted dataset to: data

Pre-train Floating-Point Model

Using NNCF for model compression assumes that the user has a pre-trained model and a training pipeline.

Here we demonstrate one possible training pipeline: a ResNet-18 model pre-trained on 1000 classes from ImageNet is fine-tuned with 200 classes from Tiny-Imagenet.

Subsequently, the training and validation functions will be reused as is for quantization-aware training.

Train Function

def train(train_loader, model, criterion, optimizer, epoch):
    batch_time = AverageMeter("Time", ":3.3f")
    losses = AverageMeter("Loss", ":2.3f")
    top1 = AverageMeter("Acc@1", ":2.2f")
    top5 = AverageMeter("Acc@5", ":2.2f")
    progress = ProgressMeter(
        len(train_loader), [batch_time, losses, top1, top5], prefix="Epoch:[{}]".format(epoch)
    )

    # switch to train mode
    model.train()

    end = time.time()
    for i, (images, target) in enumerate(train_loader):
        images = images.to(device)
        target = target.to(device)

        # compute output
        output = model(images)
        loss = criterion(output, target)

        # measure accuracy and record loss
        acc1, acc5 = accuracy(output, target, topk=(1, 5))
        losses.update(loss.item(), images.size(0))
        top1.update(acc1[0], images.size(0))
        top5.update(acc5[0], images.size(0))

        # compute gradient and do opt step
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        # measure elapsed time
        batch_time.update(time.time() - end)
        end = time.time()

        print_frequency = 50
        if i % print_frequency == 0:
            progress.display(i)

Validate Function

def validate(val_loader, model, criterion):
    batch_time = AverageMeter("Time", ":3.3f")
    losses = AverageMeter("Loss", ":2.3f")
    top1 = AverageMeter("Acc@1", ":2.2f")
    top5 = AverageMeter("Acc@5", ":2.2f")
    progress = ProgressMeter(len(val_loader), [batch_time, losses, top1, top5], prefix="Test: ")

    # switch to evaluate mode
    model.eval()

    with torch.no_grad():
        end = time.time()
        for i, (images, target) in enumerate(val_loader):
            images = images.to(device)
            target = target.to(device)

            # compute output
            output = model(images)
            loss = criterion(output, target)

            # measure accuracy and record loss
            acc1, acc5 = accuracy(output, target, topk=(1, 5))
            losses.update(loss.item(), images.size(0))
            top1.update(acc1[0], images.size(0))
            top5.update(acc5[0], images.size(0))

            # measure elapsed time
            batch_time.update(time.time() - end)
            end = time.time()

            print_frequency = 10
            if i % print_frequency == 0:
                progress.display(i)

        print(" * Acc@1 {top1.avg:.3f} Acc@5 {top5.avg:.3f}".format(top1=top1, top5=top5))
    return top1.avg

Helpers

class AverageMeter(object):
    """Computes and stores the average and current value"""

    def __init__(self, name, fmt=":f"):
        self.name = name
        self.fmt = fmt
        self.reset()

    def reset(self):
        self.val = 0
        self.avg = 0
        self.sum = 0
        self.count = 0

    def update(self, val, n=1):
        self.val = val
        self.sum += val * n
        self.count += n
        self.avg = self.sum / self.count

    def __str__(self):
        fmtstr = "{name} {val" + self.fmt + "} ({avg" + self.fmt + "})"
        return fmtstr.format(**self.__dict__)


class ProgressMeter(object):
    def __init__(self, num_batches, meters, prefix=""):
        self.batch_fmtstr = self._get_batch_fmtstr(num_batches)
        self.meters = meters
        self.prefix = prefix

    def display(self, batch):
        entries = [self.prefix + self.batch_fmtstr.format(batch)]
        entries += [str(meter) for meter in self.meters]
        print("\t".join(entries))

    def _get_batch_fmtstr(self, num_batches):
        num_digits = len(str(num_batches // 1))
        fmt = "{:" + str(num_digits) + "d}"
        return "[" + fmt + "/" + fmt.format(num_batches) + "]"


def accuracy(output, target, topk=(1,)):
    """Computes the accuracy over the k top predictions for the specified values of k"""
    with torch.no_grad():
        maxk = max(topk)
        batch_size = target.size(0)

        _, pred = output.topk(maxk, 1, True, True)
        pred = pred.t()
        correct = pred.eq(target.view(1, -1).expand_as(pred))

        res = []
        for k in topk:
            correct_k = correct[:k].reshape(-1).float().sum(0, keepdim=True)
            res.append(correct_k.mul_(100.0 / batch_size))
        return res

Get a Pre-trained FP32 Model

А pre-trained floating-point model is a prerequisite for quantization. It can be obtained by tuning from scratch with the code below. However, this usually takes a lot of time. Therefore, we have already run this code and received good enough weights after 4 epochs (for the sake of simplicity, we did not tune until the best accuracy). By default, this notebook just loads these weights without launching training. To train the model yourself on a model pre-trained on ImageNet, set pretrained_on_tiny_imagenet = False in the Imports and Settings section at the top of this notebook.

num_classes = 200  # 200 is for Tiny ImageNet, default is 1000 for ImageNet
init_lr = 1e-4
batch_size = 128
epochs = 4

model = models.resnet18(pretrained=not pretrained_on_tiny_imagenet)
# update the last FC layer for Tiny ImageNet number of classes
model.fc = nn.Linear(in_features=512, out_features=num_classes, bias=True)
model.to(device)

# Data loading code
train_dir = DATASET_DIR / "train"
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])

dataset = datasets.ImageFolder(
    train_dir,
    transforms.Compose(
        [
            transforms.Resize(image_size),
            transforms.RandomHorizontalFlip(),
            transforms.ToTensor(),
            normalize,
        ]
    ),
)
train_dataset, val_dataset = torch.utils.data.random_split(
    dataset, [80000, 20000], generator=torch.Generator().manual_seed(0)
)

train_loader = torch.utils.data.DataLoader(
    train_dataset, batch_size=batch_size, shuffle=True, num_workers=4, pin_memory=True, sampler=None
)

val_loader = torch.utils.data.DataLoader(
    val_dataset, batch_size=batch_size, shuffle=False, num_workers=4, pin_memory=True
)

# define loss function (criterion) and optimizer
criterion = nn.CrossEntropyLoss().to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=init_lr)
acc1 = 0
best_acc1 = 0
if pretrained_on_tiny_imagenet:
    #
    # ** WARNING: torch.load functionality uses Python's pickling module that
    # may be used to perform arbitrary code execution during unpickling. Only load data that you
    # trust.
    #
    checkpoint = torch.load(str(fp32_pth_path), map_location="cpu")
    model.load_state_dict(checkpoint["state_dict"], strict=True)
    best_acc1 = checkpoint["acc1"]
else:
    # Training loop
    for epoch in range(0, epochs):
        # run a single training epoch
        train(train_loader, model, criterion, optimizer, epoch)

        # evaluate on validation set
        acc1 = validate(val_loader, model, criterion)

        is_best = acc1 > best_acc1
        best_acc1 = max(acc1, best_acc1)

        if is_best:
            checkpoint = {"state_dict": model.state_dict(), "acc1": acc1}
            torch.save(checkpoint, fp32_pth_path)

print(f"Accuracy of FP32 model: {best_acc1:.3f}")
Accuracy of FP32 model: 53.235

Export the FP32 model to ONNX, which is supported by OpenVINO™ Toolkit, to benchmark it in comparison with the INT8 model.

dummy_input = torch.randn(1, 3, image_size, image_size).to(device)

torch.onnx.export(model, dummy_input, fp32_onnx_path)
print(f"FP32 ONNX model was exported to {fp32_onnx_path}.")
FP32 ONNX model was exported to output/resnet18_fp32.onnx.

Create and Initialize Quantization

NNCF enables compression-aware training by integrating into regular training pipelines. The framework is designed so that modifications to your original training code are minor. Quantization is the simplest scenario and requires only 3 modifications.

  1. Configure NNCF parameters to specify compression

nncf_config_dict = {
    "input_info": {"sample_size": [1, 3, image_size, image_size]},
    "log_dir": str(OUTPUT_DIR),  # log directory for NNCF-specific logging outputs
    "compression": {
        "algorithm": "quantization",  # specify the algorithm here
    },
}
nncf_config = NNCFConfig.from_dict(nncf_config_dict)
  1. Provide data loader to initialize the values of quantization ranges and determine which activation should be signed or unsigned from the collected statistics using a given number of samples.

nncf_config = register_default_init_args(nncf_config, train_loader)
INFO:nncf:Please, provide execution parameters for optimal model initialization
  1. Create a wrapped model ready for compression fine-tuning from a pre-trained FP32 model and configuration object.

compression_ctrl, model = create_compressed_model(model, nncf_config)
WARNING:nncf:Graphviz is not installed - only the .dot model visualization format will be used. Install pygraphviz into your Python environment and graphviz system-wide to enable PNG rendering.
INFO:nncf:Wrapping module ResNet/Conv2d[conv1] by ResNet/NNCFConv2d[conv1]
INFO:nncf:Wrapping module ResNet/Sequential[layer1]/BasicBlock[0]/Conv2d[conv1] by ResNet/Sequential[layer1]/BasicBlock[0]/NNCFConv2d[conv1]
INFO:nncf:Wrapping module ResNet/Sequential[layer1]/BasicBlock[0]/Conv2d[conv2] by ResNet/Sequential[layer1]/BasicBlock[0]/NNCFConv2d[conv2]
INFO:nncf:Wrapping module ResNet/Sequential[layer1]/BasicBlock[1]/Conv2d[conv1] by ResNet/Sequential[layer1]/BasicBlock[1]/NNCFConv2d[conv1]
INFO:nncf:Wrapping module ResNet/Sequential[layer1]/BasicBlock[1]/Conv2d[conv2] by ResNet/Sequential[layer1]/BasicBlock[1]/NNCFConv2d[conv2]
INFO:nncf:Wrapping module ResNet/Sequential[layer2]/BasicBlock[0]/Conv2d[conv1] by ResNet/Sequential[layer2]/BasicBlock[0]/NNCFConv2d[conv1]
INFO:nncf:Wrapping module ResNet/Sequential[layer2]/BasicBlock[0]/Conv2d[conv2] by ResNet/Sequential[layer2]/BasicBlock[0]/NNCFConv2d[conv2]
INFO:nncf:Wrapping module ResNet/Sequential[layer2]/BasicBlock[0]/Sequential[downsample]/Conv2d[0] by ResNet/Sequential[layer2]/BasicBlock[0]/Sequential[downsample]/NNCFConv2d[0]
INFO:nncf:Wrapping module ResNet/Sequential[layer2]/BasicBlock[1]/Conv2d[conv1] by ResNet/Sequential[layer2]/BasicBlock[1]/NNCFConv2d[conv1]
INFO:nncf:Wrapping module ResNet/Sequential[layer2]/BasicBlock[1]/Conv2d[conv2] by ResNet/Sequential[layer2]/BasicBlock[1]/NNCFConv2d[conv2]
INFO:nncf:Wrapping module ResNet/Sequential[layer3]/BasicBlock[0]/Conv2d[conv1] by ResNet/Sequential[layer3]/BasicBlock[0]/NNCFConv2d[conv1]
INFO:nncf:Wrapping module ResNet/Sequential[layer3]/BasicBlock[0]/Conv2d[conv2] by ResNet/Sequential[layer3]/BasicBlock[0]/NNCFConv2d[conv2]
INFO:nncf:Wrapping module ResNet/Sequential[layer3]/BasicBlock[0]/Sequential[downsample]/Conv2d[0] by ResNet/Sequential[layer3]/BasicBlock[0]/Sequential[downsample]/NNCFConv2d[0]
INFO:nncf:Wrapping module ResNet/Sequential[layer3]/BasicBlock[1]/Conv2d[conv1] by ResNet/Sequential[layer3]/BasicBlock[1]/NNCFConv2d[conv1]
INFO:nncf:Wrapping module ResNet/Sequential[layer3]/BasicBlock[1]/Conv2d[conv2] by ResNet/Sequential[layer3]/BasicBlock[1]/NNCFConv2d[conv2]
INFO:nncf:Wrapping module ResNet/Sequential[layer4]/BasicBlock[0]/Conv2d[conv1] by ResNet/Sequential[layer4]/BasicBlock[0]/NNCFConv2d[conv1]
INFO:nncf:Wrapping module ResNet/Sequential[layer4]/BasicBlock[0]/Conv2d[conv2] by ResNet/Sequential[layer4]/BasicBlock[0]/NNCFConv2d[conv2]
INFO:nncf:Wrapping module ResNet/Sequential[layer4]/BasicBlock[0]/Sequential[downsample]/Conv2d[0] by ResNet/Sequential[layer4]/BasicBlock[0]/Sequential[downsample]/NNCFConv2d[0]
INFO:nncf:Wrapping module ResNet/Sequential[layer4]/BasicBlock[1]/Conv2d[conv1] by ResNet/Sequential[layer4]/BasicBlock[1]/NNCFConv2d[conv1]
INFO:nncf:Wrapping module ResNet/Sequential[layer4]/BasicBlock[1]/Conv2d[conv2] by ResNet/Sequential[layer4]/BasicBlock[1]/NNCFConv2d[conv2]
INFO:nncf:Wrapping module ResNet/Linear[fc] by ResNet/NNCFLinear[fc]
WARNING:nncf:Enabling quantization range initialization with default parameters.
WARNING:nncf:NNCFNetwork(
  (nncf_module): ResNet(
    (conv1): NNCFConv2d(
      3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False
      (pre_ops): ModuleDict()
      (post_ops): ModuleDict()
    )
    (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu): ReLU(inplace=True)
    (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (layer1): Sequential(
      (0): BasicBlock(
        (conv1): NNCFConv2d(
          64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (pre_ops): ModuleDict()
          (post_ops): ModuleDict()
        )
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): NNCFConv2d(
          64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (pre_ops): ModuleDict()
          (post_ops): ModuleDict()
        )
        (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (1): BasicBlock(
        (conv1): NNCFConv2d(
          64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (pre_ops): ModuleDict()
          (post_ops): ModuleDict()
        )
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): NNCFConv2d(
          64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (pre_ops): ModuleDict()
          (post_ops): ModuleDict()
        )
        (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (layer2): Sequential(
      (0): BasicBlock(
        (conv1): NNCFConv2d(
          64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False
          (pre_ops): ModuleDict()
          (post_ops): ModuleDict()
        )
        (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): NNCFConv2d(
          128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (pre_ops): ModuleDict()
          (post_ops): ModuleDict()
        )
        (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (downsample): Sequential(
          (0): NNCFConv2d(
            64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False
            (pre_ops): ModuleDict()
            (post_ops): ModuleDict()
          )
          (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
      )
      (1): BasicBlock(
        (conv1): NNCFConv2d(
          128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (pre_ops): ModuleDict()
          (post_ops): ModuleDict()
        )
        (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): NNCFConv2d(
          128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (pre_ops): ModuleDict()
          (post_ops): ModuleDict()
        )
        (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (layer3): Sequential(
      (0): BasicBlock(
        (conv1): NNCFConv2d(
          128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False
          (pre_ops): ModuleDict()
          (post_ops): ModuleDict()
        )
        (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): NNCFConv2d(
          256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (pre_ops): ModuleDict()
          (post_ops): ModuleDict()
        )
        (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (downsample): Sequential(
          (0): NNCFConv2d(
            128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False
            (pre_ops): ModuleDict()
            (post_ops): ModuleDict()
          )
          (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
      )
      (1): BasicBlock(
        (conv1): NNCFConv2d(
          256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (pre_ops): ModuleDict()
          (post_ops): ModuleDict()
        )
        (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): NNCFConv2d(
          256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (pre_ops): ModuleDict()
          (post_ops): ModuleDict()
        )
        (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (layer4): Sequential(
      (0): BasicBlock(
        (conv1): NNCFConv2d(
          256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False
          (pre_ops): ModuleDict()
          (post_ops): ModuleDict()
        )
        (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): NNCFConv2d(
          512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (pre_ops): ModuleDict()
          (post_ops): ModuleDict()
        )
        (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (downsample): Sequential(
          (0): NNCFConv2d(
            256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False
            (pre_ops): ModuleDict()
            (post_ops): ModuleDict()
          )
          (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
      )
      (1): BasicBlock(
        (conv1): NNCFConv2d(
          512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (pre_ops): ModuleDict()
          (post_ops): ModuleDict()
        )
        (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): NNCFConv2d(
          512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (pre_ops): ModuleDict()
          (post_ops): ModuleDict()
        )
        (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
    (fc): NNCFLinear(
      in_features=512, out_features=200, bias=True
      (pre_ops): ModuleDict()
      (post_ops): ModuleDict()
    )
  )
)
INFO:nncf:Collecting tensor statistics ████████          | 1 / 2
INFO:nncf:Collecting tensor statistics ████████████████  | 2 / 2
INFO:nncf:Set sign: True and scale: [2.6400, ] for TargetType.OPERATOR_POST_HOOK /nncf_model_input_0
INFO:nncf:Performing signed activation quantization for: TargetType.OPERATOR_POST_HOOK /nncf_model_input_0
INFO:nncf:Set sign: False and scale: [2.3831, ] for TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer1]/BasicBlock[0]/ReLU[relu]/relu__0
INFO:nncf:Performing unsigned activation quantization for: TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer1]/BasicBlock[0]/ReLU[relu]/relu__0
INFO:nncf:Set sign: True and scale: [3.3059, ] for TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer1]/BasicBlock[0]/BatchNorm2d[bn2]/batch_norm_0
INFO:nncf:Performing signed activation quantization for: TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer1]/BasicBlock[0]/BatchNorm2d[bn2]/batch_norm_0
INFO:nncf:Set sign: False and scale: [2.6874, ] for TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer1]/BasicBlock[1]/ReLU[relu]/relu__0
INFO:nncf:Performing unsigned activation quantization for: TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer1]/BasicBlock[1]/ReLU[relu]/relu__0
INFO:nncf:Set sign: True and scale: [5.6456, ] for TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer1]/BasicBlock[1]/BatchNorm2d[bn2]/batch_norm_0
INFO:nncf:Performing signed activation quantization for: TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer1]/BasicBlock[1]/BatchNorm2d[bn2]/batch_norm_0
INFO:nncf:Set sign: False and scale: [2.0028, ] for TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer2]/BasicBlock[0]/ReLU[relu]/relu__0
INFO:nncf:Performing unsigned activation quantization for: TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer2]/BasicBlock[0]/ReLU[relu]/relu__0
INFO:nncf:Set sign: True and scale: [2.9754, ] for TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer2]/BasicBlock[0]/BatchNorm2d[bn2]/batch_norm_0
INFO:nncf:Performing signed activation quantization for: TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer2]/BasicBlock[0]/BatchNorm2d[bn2]/batch_norm_0
INFO:nncf:Set sign: True and scale: [3.4110, ] for TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer2]/BasicBlock[0]/Sequential[downsample]/BatchNorm2d[1]/batch_norm_0
INFO:nncf:Performing signed activation quantization for: TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer2]/BasicBlock[0]/Sequential[downsample]/BatchNorm2d[1]/batch_norm_0
INFO:nncf:Set sign: False and scale: [2.1791, ] for TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer2]/BasicBlock[1]/ReLU[relu]/relu__0
INFO:nncf:Performing unsigned activation quantization for: TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer2]/BasicBlock[1]/ReLU[relu]/relu__0
INFO:nncf:Set sign: True and scale: [3.9956, ] for TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer2]/BasicBlock[1]/BatchNorm2d[bn2]/batch_norm_0
INFO:nncf:Performing signed activation quantization for: TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer2]/BasicBlock[1]/BatchNorm2d[bn2]/batch_norm_0
INFO:nncf:Set sign: False and scale: [2.1768, ] for TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer3]/BasicBlock[0]/ReLU[relu]/relu__0
INFO:nncf:Performing unsigned activation quantization for: TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer3]/BasicBlock[0]/ReLU[relu]/relu__0
INFO:nncf:Set sign: True and scale: [3.5078, ] for TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer3]/BasicBlock[0]/BatchNorm2d[bn2]/batch_norm_0
INFO:nncf:Performing signed activation quantization for: TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer3]/BasicBlock[0]/BatchNorm2d[bn2]/batch_norm_0
INFO:nncf:Set sign: True and scale: [1.8105, ] for TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer3]/BasicBlock[0]/Sequential[downsample]/BatchNorm2d[1]/batch_norm_0
INFO:nncf:Performing signed activation quantization for: TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer3]/BasicBlock[0]/Sequential[downsample]/BatchNorm2d[1]/batch_norm_0
INFO:nncf:Set sign: False and scale: [2.0073, ] for TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer3]/BasicBlock[1]/ReLU[relu]/relu__0
INFO:nncf:Performing unsigned activation quantization for: TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer3]/BasicBlock[1]/ReLU[relu]/relu__0
INFO:nncf:Set sign: True and scale: [4.0719, ] for TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer3]/BasicBlock[1]/BatchNorm2d[bn2]/batch_norm_0
INFO:nncf:Performing signed activation quantization for: TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer3]/BasicBlock[1]/BatchNorm2d[bn2]/batch_norm_0
INFO:nncf:Set sign: False and scale: [1.7835, ] for TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer4]/BasicBlock[0]/ReLU[relu]/relu__0
INFO:nncf:Performing unsigned activation quantization for: TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer4]/BasicBlock[0]/ReLU[relu]/relu__0
INFO:nncf:Set sign: True and scale: [3.2653, ] for TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer4]/BasicBlock[0]/BatchNorm2d[bn2]/batch_norm_0
INFO:nncf:Performing signed activation quantization for: TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer4]/BasicBlock[0]/BatchNorm2d[bn2]/batch_norm_0
INFO:nncf:Set sign: True and scale: [2.0781, ] for TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer4]/BasicBlock[0]/Sequential[downsample]/BatchNorm2d[1]/batch_norm_0
INFO:nncf:Performing signed activation quantization for: TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer4]/BasicBlock[0]/Sequential[downsample]/BatchNorm2d[1]/batch_norm_0
INFO:nncf:Set sign: False and scale: [1.4773, ] for TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer4]/BasicBlock[1]/ReLU[relu]/relu__0
INFO:nncf:Performing unsigned activation quantization for: TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer4]/BasicBlock[1]/ReLU[relu]/relu__0
INFO:nncf:Set sign: True and scale: [11.1966, ] for TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer4]/BasicBlock[1]/BatchNorm2d[bn2]/batch_norm_0
INFO:nncf:Performing signed activation quantization for: TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer4]/BasicBlock[1]/BatchNorm2d[bn2]/batch_norm_0
INFO:nncf:Set sign: False and scale: [12.2984, ] for TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer4]/BasicBlock[1]/ReLU[relu]/relu__1
INFO:nncf:Performing unsigned activation quantization for: TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer4]/BasicBlock[1]/ReLU[relu]/relu__1
INFO:nncf:Set sign: False and scale: [11.4582, ] for TargetType.OPERATOR_POST_HOOK ResNet/AdaptiveAvgPool2d[avgpool]/adaptive_avg_pool2d_0
INFO:nncf:Performing unsigned activation quantization for: TargetType.OPERATOR_POST_HOOK ResNet/AdaptiveAvgPool2d[avgpool]/adaptive_avg_pool2d_0
INFO:nncf:Set sign: False and scale: [3.3400, ] for TargetType.OPERATOR_POST_HOOK ResNet/ReLU[relu]/relu__0
INFO:nncf:Performing unsigned activation quantization for: TargetType.OPERATOR_POST_HOOK ResNet/ReLU[relu]/relu__0
INFO:nncf:Set sign: False and scale: [4.3247, ] for TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer1]/BasicBlock[0]/ReLU[relu]/relu__1
INFO:nncf:Performing unsigned activation quantization for: TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer1]/BasicBlock[0]/ReLU[relu]/relu__1
INFO:nncf:Set sign: False and scale: [5.4474, ] for TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer1]/BasicBlock[1]/ReLU[relu]/relu__1
INFO:nncf:Performing unsigned activation quantization for: TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer1]/BasicBlock[1]/ReLU[relu]/relu__1
INFO:nncf:Set sign: False and scale: [3.6306, ] for TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer2]/BasicBlock[0]/ReLU[relu]/relu__1
INFO:nncf:Performing unsigned activation quantization for: TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer2]/BasicBlock[0]/ReLU[relu]/relu__1
INFO:nncf:Set sign: False and scale: [4.1803, ] for TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer2]/BasicBlock[1]/ReLU[relu]/relu__1
INFO:nncf:Performing unsigned activation quantization for: TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer2]/BasicBlock[1]/ReLU[relu]/relu__1
INFO:nncf:Set sign: False and scale: [3.5086, ] for TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer3]/BasicBlock[0]/ReLU[relu]/relu__1
INFO:nncf:Performing unsigned activation quantization for: TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer3]/BasicBlock[0]/ReLU[relu]/relu__1
INFO:nncf:Set sign: False and scale: [3.5363, ] for TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer3]/BasicBlock[1]/ReLU[relu]/relu__1
INFO:nncf:Performing unsigned activation quantization for: TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer3]/BasicBlock[1]/ReLU[relu]/relu__1
INFO:nncf:Set sign: False and scale: [3.8594, ] for TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer4]/BasicBlock[0]/ReLU[relu]/relu__1
INFO:nncf:Performing unsigned activation quantization for: TargetType.OPERATOR_POST_HOOK ResNet/Sequential[layer4]/BasicBlock[0]/ReLU[relu]/relu__1
WARNING:nncf:The saturation issue fix will be applied. Now all weight quantizers will effectively use only 7 bits out of 8 bits. This resolves the saturation issue problem on AVX2 and AVX-512 machines. Please take a look at the documentation for a detailed information.
INFO:nncf:Set sign: True and scale: [0.7697, 0.3192, 0.1000, 0.1759, 0.1000, 0.4172, 0.5834, 0.1000, 0.7551, 0.1000, ... (first 10/64 elements shown only) ] for TargetType.OPERATION_WITH_WEIGHTS ResNet/NNCFConv2d[conv1]/conv2d_0
INFO:nncf:Performing signed weight quantization for: TargetType.OPERATION_WITH_WEIGHTS ResNet/NNCFConv2d[conv1]/conv2d_0
INFO:nncf:Set sign: True and scale: [0.8057, 0.2424, 0.5567, 0.3693, 0.3543, 0.3245, 0.3739, 0.1844, 0.4633, 0.4254, ... (first 10/64 elements shown only) ] for TargetType.OPERATION_WITH_WEIGHTS ResNet/Sequential[layer1]/BasicBlock[0]/NNCFConv2d[conv1]/conv2d_0
INFO:nncf:Performing signed weight quantization for: TargetType.OPERATION_WITH_WEIGHTS ResNet/Sequential[layer1]/BasicBlock[0]/NNCFConv2d[conv1]/conv2d_0
INFO:nncf:Set sign: True and scale: [0.3296, 0.1498, 0.1000, 0.2334, 0.1321, 0.2323, 0.2458, 0.1831, 0.2766, 0.2608, ... (first 10/64 elements shown only) ] for TargetType.OPERATION_WITH_WEIGHTS ResNet/Sequential[layer1]/BasicBlock[0]/NNCFConv2d[conv2]/conv2d_0
INFO:nncf:Performing signed weight quantization for: TargetType.OPERATION_WITH_WEIGHTS ResNet/Sequential[layer1]/BasicBlock[0]/NNCFConv2d[conv2]/conv2d_0
INFO:nncf:Set sign: True and scale: [0.2264, 0.4283, 0.2280, 0.1708, 0.3042, 0.3131, 0.1995, 0.6516, 0.3480, 0.3617, ... (first 10/64 elements shown only) ] for TargetType.OPERATION_WITH_WEIGHTS ResNet/Sequential[layer1]/BasicBlock[1]/NNCFConv2d[conv1]/conv2d_0
INFO:nncf:Performing signed weight quantization for: TargetType.OPERATION_WITH_WEIGHTS ResNet/Sequential[layer1]/BasicBlock[1]/NNCFConv2d[conv1]/conv2d_0
INFO:nncf:Set sign: True and scale: [0.1974, 0.3122, 0.1805, 0.1872, 0.1684, 0.3455, 0.2941, 0.2890, 0.2790, 0.2182, ... (first 10/64 elements shown only) ] for TargetType.OPERATION_WITH_WEIGHTS ResNet/Sequential[layer1]/BasicBlock[1]/NNCFConv2d[conv2]/conv2d_0
INFO:nncf:Performing signed weight quantization for: TargetType.OPERATION_WITH_WEIGHTS ResNet/Sequential[layer1]/BasicBlock[1]/NNCFConv2d[conv2]/conv2d_0
INFO:nncf:Set sign: True and scale: [0.1751, 0.1536, 0.2079, 0.2128, 0.1923, 0.2390, 0.1268, 0.2419, 0.1381, 0.1761, ... (first 10/128 elements shown only) ] for TargetType.OPERATION_WITH_WEIGHTS ResNet/Sequential[layer2]/BasicBlock[0]/NNCFConv2d[conv1]/conv2d_0
INFO:nncf:Performing signed weight quantization for: TargetType.OPERATION_WITH_WEIGHTS ResNet/Sequential[layer2]/BasicBlock[0]/NNCFConv2d[conv1]/conv2d_0
INFO:nncf:Set sign: True and scale: [0.2277, 0.1510, 0.2501, 0.1147, 0.2240, 0.1992, 0.1876, 0.1672, 0.1803, 0.1529, ... (first 10/128 elements shown only) ] for TargetType.OPERATION_WITH_WEIGHTS ResNet/Sequential[layer2]/BasicBlock[0]/NNCFConv2d[conv2]/conv2d_0
INFO:nncf:Performing signed weight quantization for: TargetType.OPERATION_WITH_WEIGHTS ResNet/Sequential[layer2]/BasicBlock[0]/NNCFConv2d[conv2]/conv2d_0
INFO:nncf:Set sign: True and scale: [0.4994, 0.1158, 0.1705, 0.3253, 0.1735, 0.1507, 0.1711, 0.3948, 0.1650, 0.3842, ... (first 10/128 elements shown only) ] for TargetType.OPERATION_WITH_WEIGHTS ResNet/Sequential[layer2]/BasicBlock[0]/Sequential[downsample]/NNCFConv2d[0]/conv2d_0
INFO:nncf:Performing signed weight quantization for: TargetType.OPERATION_WITH_WEIGHTS ResNet/Sequential[layer2]/BasicBlock[0]/Sequential[downsample]/NNCFConv2d[0]/conv2d_0
INFO:nncf:Set sign: True and scale: [0.2603, 0.1477, 0.1656, 0.1719, 0.2057, 0.2082, 0.1157, 0.2736, 0.2245, 0.2921, ... (first 10/128 elements shown only) ] for TargetType.OPERATION_WITH_WEIGHTS ResNet/Sequential[layer2]/BasicBlock[1]/NNCFConv2d[conv1]/conv2d_0
INFO:nncf:Performing signed weight quantization for: TargetType.OPERATION_WITH_WEIGHTS ResNet/Sequential[layer2]/BasicBlock[1]/NNCFConv2d[conv1]/conv2d_0
INFO:nncf:Set sign: True and scale: [0.1350, 0.1000, 0.1554, 0.1801, 0.1467, 0.1267, 0.1297, 0.1296, 0.1075, 0.1730, ... (first 10/128 elements shown only) ] for TargetType.OPERATION_WITH_WEIGHTS ResNet/Sequential[layer2]/BasicBlock[1]/NNCFConv2d[conv2]/conv2d_0
INFO:nncf:Performing signed weight quantization for: TargetType.OPERATION_WITH_WEIGHTS ResNet/Sequential[layer2]/BasicBlock[1]/NNCFConv2d[conv2]/conv2d_0
INFO:nncf:Set sign: True and scale: [0.1748, 0.2837, 0.1333, 0.1774, 0.1172, 0.1427, 0.1237, 0.1293, 0.1278, 0.1099, ... (first 10/256 elements shown only) ] for TargetType.OPERATION_WITH_WEIGHTS ResNet/Sequential[layer3]/BasicBlock[0]/NNCFConv2d[conv1]/conv2d_0
INFO:nncf:Performing signed weight quantization for: TargetType.OPERATION_WITH_WEIGHTS ResNet/Sequential[layer3]/BasicBlock[0]/NNCFConv2d[conv1]/conv2d_0
INFO:nncf:Set sign: True and scale: [0.1263, 0.1105, 0.1000, 0.1835, 0.1286, 0.1917, 0.1617, 0.1074, 0.1594, 0.1212, ... (first 10/256 elements shown only) ] for TargetType.OPERATION_WITH_WEIGHTS ResNet/Sequential[layer3]/BasicBlock[0]/NNCFConv2d[conv2]/conv2d_0
INFO:nncf:Performing signed weight quantization for: TargetType.OPERATION_WITH_WEIGHTS ResNet/Sequential[layer3]/BasicBlock[0]/NNCFConv2d[conv2]/conv2d_0
INFO:nncf:Set sign: True and scale: [0.1231, 0.1069, 0.1000, 0.1602, 0.1055, 0.1290, 0.1011, 0.1006, 0.1000, 0.1000, ... (first 10/256 elements shown only) ] for TargetType.OPERATION_WITH_WEIGHTS ResNet/Sequential[layer3]/BasicBlock[0]/Sequential[downsample]/NNCFConv2d[0]/conv2d_0
INFO:nncf:Performing signed weight quantization for: TargetType.OPERATION_WITH_WEIGHTS ResNet/Sequential[layer3]/BasicBlock[0]/Sequential[downsample]/NNCFConv2d[0]/conv2d_0
INFO:nncf:Set sign: True and scale: [0.1070, 0.2032, 0.1125, 0.2060, 0.1021, 0.1053, 0.2286, 0.1581, 0.1000, 0.1262, ... (first 10/256 elements shown only) ] for TargetType.OPERATION_WITH_WEIGHTS ResNet/Sequential[layer3]/BasicBlock[1]/NNCFConv2d[conv1]/conv2d_0
INFO:nncf:Performing signed weight quantization for: TargetType.OPERATION_WITH_WEIGHTS ResNet/Sequential[layer3]/BasicBlock[1]/NNCFConv2d[conv1]/conv2d_0
INFO:nncf:Set sign: True and scale: [0.1000, 0.1317, 0.1000, 0.1000, 0.1000, 0.1000, 0.1000, 0.1000, 0.1000, 0.1000, ... (first 10/256 elements shown only) ] for TargetType.OPERATION_WITH_WEIGHTS ResNet/Sequential[layer3]/BasicBlock[1]/NNCFConv2d[conv2]/conv2d_0
INFO:nncf:Performing signed weight quantization for: TargetType.OPERATION_WITH_WEIGHTS ResNet/Sequential[layer3]/BasicBlock[1]/NNCFConv2d[conv2]/conv2d_0
INFO:nncf:Set sign: True and scale: [0.1046, 0.1000, 0.1166, 0.1000, 0.1000, 0.1000, 0.1000, 0.1315, 0.1000, 0.1356, ... (first 10/512 elements shown only) ] for TargetType.OPERATION_WITH_WEIGHTS ResNet/Sequential[layer4]/BasicBlock[0]/NNCFConv2d[conv1]/conv2d_0
INFO:nncf:Performing signed weight quantization for: TargetType.OPERATION_WITH_WEIGHTS ResNet/Sequential[layer4]/BasicBlock[0]/NNCFConv2d[conv1]/conv2d_0
INFO:nncf:Set sign: True and scale: [0.1130, 0.1029, 0.1000, 0.1000, 0.1035, 0.1000, 0.1252, 0.1065, 0.1000, 0.1227, ... (first 10/512 elements shown only) ] for TargetType.OPERATION_WITH_WEIGHTS ResNet/Sequential[layer4]/BasicBlock[0]/NNCFConv2d[conv2]/conv2d_0
INFO:nncf:Performing signed weight quantization for: TargetType.OPERATION_WITH_WEIGHTS ResNet/Sequential[layer4]/BasicBlock[0]/NNCFConv2d[conv2]/conv2d_0
INFO:nncf:Set sign: True and scale: [0.1000, 0.1292, 0.1000, 0.2748, 0.1000, 0.1032, 0.1426, 0.2061, 0.1000, 0.1048, ... (first 10/512 elements shown only) ] for TargetType.OPERATION_WITH_WEIGHTS ResNet/Sequential[layer4]/BasicBlock[0]/Sequential[downsample]/NNCFConv2d[0]/conv2d_0
INFO:nncf:Performing signed weight quantization for: TargetType.OPERATION_WITH_WEIGHTS ResNet/Sequential[layer4]/BasicBlock[0]/Sequential[downsample]/NNCFConv2d[0]/conv2d_0
INFO:nncf:Set sign: True and scale: [0.1000, 0.1000, 0.1000, 0.1000, 0.1000, 0.1000, 0.1000, 0.1019, 0.1000, 0.1000, ... (first 10/512 elements shown only) ] for TargetType.OPERATION_WITH_WEIGHTS ResNet/Sequential[layer4]/BasicBlock[1]/NNCFConv2d[conv1]/conv2d_0
INFO:nncf:Performing signed weight quantization for: TargetType.OPERATION_WITH_WEIGHTS ResNet/Sequential[layer4]/BasicBlock[1]/NNCFConv2d[conv1]/conv2d_0
INFO:nncf:Set sign: True and scale: [0.1000, 0.1000, 0.1000, 0.1000, 0.1000, 0.1000, 0.1000, 0.1317, 0.1000, 0.1000, ... (first 10/512 elements shown only) ] for TargetType.OPERATION_WITH_WEIGHTS ResNet/Sequential[layer4]/BasicBlock[1]/NNCFConv2d[conv2]/conv2d_0
INFO:nncf:Performing signed weight quantization for: TargetType.OPERATION_WITH_WEIGHTS ResNet/Sequential[layer4]/BasicBlock[1]/NNCFConv2d[conv2]/conv2d_0
INFO:nncf:Set sign: True and scale: [0.1000, 0.1000, 0.1000, 0.1000, 0.1000, 0.1000, 0.1000, 0.1000, 0.1000, 0.1000, ... (first 10/200 elements shown only) ] for TargetType.OPERATION_WITH_WEIGHTS ResNet/NNCFLinear[fc]/linear_0
INFO:nncf:Performing signed weight quantization for: TargetType.OPERATION_WITH_WEIGHTS ResNet/NNCFLinear[fc]/linear_0
INFO:nncf:BatchNorm statistics adaptation █                 | 1 / 16
INFO:nncf:BatchNorm statistics adaptation ██                | 2 / 16
INFO:nncf:BatchNorm statistics adaptation ███               | 3 / 16
INFO:nncf:BatchNorm statistics adaptation ████              | 4 / 16
INFO:nncf:BatchNorm statistics adaptation █████             | 5 / 16
INFO:nncf:BatchNorm statistics adaptation ██████            | 6 / 16
INFO:nncf:BatchNorm statistics adaptation ███████           | 7 / 16
INFO:nncf:BatchNorm statistics adaptation ████████          | 8 / 16
INFO:nncf:BatchNorm statistics adaptation █████████         | 9 / 16
INFO:nncf:BatchNorm statistics adaptation ██████████        | 10 / 16
INFO:nncf:BatchNorm statistics adaptation ███████████       | 11 / 16
INFO:nncf:BatchNorm statistics adaptation ████████████      | 12 / 16
INFO:nncf:BatchNorm statistics adaptation █████████████     | 13 / 16
INFO:nncf:BatchNorm statistics adaptation ██████████████    | 14 / 16
INFO:nncf:BatchNorm statistics adaptation ███████████████   | 15 / 16
INFO:nncf:BatchNorm statistics adaptation ████████████████  | 16 / 16
WARNING:nncf:Graphviz is not installed - only the .dot model visualization format will be used. Install pygraphviz into your Python environment and graphviz system-wide to enable PNG rendering.

Evaluate the new model on the validation set after initialization of quantization. The accuracy should be close to the accuracy of the floating-point FP32 model for a simple case like the one we are demonstrating now.

acc1 = validate(val_loader, model, criterion)
print(f"Accuracy of initialized INT8 model: {acc1:.3f}")
Test: [  0/157] Time 1.498 (1.498)  Loss 1.965 (1.965)  Acc@1 53.91 (53.91) Acc@5 82.03 (82.03)
Test: [ 10/157] Time 0.894 (0.949)  Loss 2.052 (1.945)  Acc@1 54.69 (51.99) Acc@5 75.78 (79.33)
Test: [ 20/157] Time 0.799 (0.880)  Loss 1.530 (1.917)  Acc@1 60.16 (53.12) Acc@5 83.59 (79.06)
Test: [ 30/157] Time 0.793 (0.854)  Loss 1.524 (1.895)  Acc@1 60.94 (53.83) Acc@5 86.72 (79.54)
Test: [ 40/157] Time 0.803 (0.839)  Loss 1.843 (1.890)  Acc@1 52.34 (53.83) Acc@5 83.59 (79.65)
Test: [ 50/157] Time 0.789 (0.830)  Loss 1.771 (1.888)  Acc@1 54.69 (53.98) Acc@5 79.69 (79.52)
Test: [ 60/157] Time 0.806 (0.826)  Loss 2.019 (1.886)  Acc@1 53.91 (54.07) Acc@5 72.66 (79.48)
Test: [ 70/157] Time 0.796 (0.823)  Loss 1.780 (1.882)  Acc@1 52.34 (54.07) Acc@5 82.03 (79.60)
Test: [ 80/157] Time 0.808 (0.821)  Loss 2.017 (1.879)  Acc@1 51.56 (54.12) Acc@5 77.34 (79.60)
Test: [ 90/157] Time 0.818 (0.820)  Loss 1.929 (1.871)  Acc@1 57.81 (54.28) Acc@5 77.34 (79.64)
Test: [100/157] Time 0.784 (0.818)  Loss 1.927 (1.878)  Acc@1 51.56 (54.09) Acc@5 80.47 (79.56)
Test: [110/157] Time 0.790 (0.816)  Loss 2.216 (1.890)  Acc@1 47.66 (53.79) Acc@5 71.88 (79.38)
Test: [120/157] Time 0.795 (0.814)  Loss 1.673 (1.892)  Acc@1 57.03 (53.74) Acc@5 80.47 (79.32)
Test: [130/157] Time 0.785 (0.813)  Loss 1.906 (1.894)  Acc@1 56.25 (53.66) Acc@5 79.69 (79.21)
Test: [140/157] Time 0.779 (0.811)  Loss 1.901 (1.890)  Acc@1 55.47 (53.72) Acc@5 76.56 (79.26)
Test: [150/157] Time 0.727 (0.808)  Loss 1.720 (1.890)  Acc@1 51.56 (53.70) Acc@5 84.38 (79.27)
 * Acc@1 53.615 Acc@5 79.325
Accuracy of initialized INT8 model: 53.615

Fine-tune the Compressed Model

At this step, a regular fine-tuning process is applied to restore accuracy drop. Normally, several epochs of tuning are required with a small learning rate, the same that is usually used at the end of the training of the original model. No other changes in the training pipeline are required. Here is a simple example.

compression_lr = init_lr / 10
optimizer = torch.optim.Adam(model.parameters(), lr=compression_lr)

# train for one epoch with NNCF
train(train_loader, model, criterion, optimizer, epoch=0)

# evaluate on validation set after Quantization-Aware Training (QAT case)
acc1 = validate(val_loader, model, criterion)

print(f"Accuracy of tuned INT8 model: {acc1:.3f}")
Epoch:[0][  0/625]  Time 3.385 (3.385)  Loss 0.993 (0.993)  Acc@1 78.12 (78.12) Acc@5 93.75 (93.75)
Epoch:[0][ 50/625]  Time 2.622 (2.443)  Loss 0.914 (0.956)  Acc@1 76.56 (76.88) Acc@5 92.97 (92.60)
Epoch:[0][100/625]  Time 2.508 (2.491)  Loss 0.786 (0.965)  Acc@1 80.47 (76.90) Acc@5 96.88 (92.58)
Epoch:[0][150/625]  Time 2.523 (2.506)  Loss 1.075 (0.967)  Acc@1 71.88 (76.88) Acc@5 89.06 (92.53)
Epoch:[0][200/625]  Time 2.504 (2.515)  Loss 0.923 (0.952)  Acc@1 80.47 (77.36) Acc@5 93.75 (92.74)
Epoch:[0][250/625]  Time 2.503 (2.518)  Loss 1.033 (0.942)  Acc@1 80.47 (77.79) Acc@5 91.41 (92.81)
Epoch:[0][300/625]  Time 2.471 (2.519)  Loss 0.826 (0.942)  Acc@1 81.25 (77.82) Acc@5 95.31 (92.75)
Epoch:[0][350/625]  Time 2.492 (2.519)  Loss 1.009 (0.937)  Acc@1 77.34 (77.94) Acc@5 92.19 (92.81)
Epoch:[0][400/625]  Time 2.510 (2.521)  Loss 0.923 (0.933)  Acc@1 79.69 (78.09) Acc@5 91.41 (92.89)
Epoch:[0][450/625]  Time 2.519 (2.524)  Loss 0.797 (0.930)  Acc@1 79.69 (78.26) Acc@5 95.31 (92.91)
Epoch:[0][500/625]  Time 2.584 (2.523)  Loss 1.050 (0.925)  Acc@1 78.91 (78.43) Acc@5 87.50 (92.97)
Epoch:[0][550/625]  Time 2.566 (2.526)  Loss 0.893 (0.923)  Acc@1 82.81 (78.43) Acc@5 94.53 (92.97)
Epoch:[0][600/625]  Time 2.404 (2.523)  Loss 1.027 (0.921)  Acc@1 74.22 (78.44) Acc@5 93.75 (92.97)
Test: [  0/157] Time 1.574 (1.574)  Loss 2.022 (2.022)  Acc@1 50.00 (50.00) Acc@5 78.91 (78.91)
Test: [ 10/157] Time 0.785 (0.886)  Loss 2.062 (1.896)  Acc@1 53.12 (53.05) Acc@5 75.00 (79.33)
Test: [ 20/157] Time 0.809 (0.850)  Loss 1.384 (1.854)  Acc@1 64.06 (54.46) Acc@5 86.72 (79.95)
Test: [ 30/157] Time 0.830 (0.842)  Loss 1.525 (1.839)  Acc@1 61.72 (54.76) Acc@5 85.94 (80.34)
Test: [ 40/157] Time 0.798 (0.836)  Loss 1.861 (1.840)  Acc@1 53.12 (54.55) Acc@5 83.59 (80.60)
Test: [ 50/157] Time 0.815 (0.832)  Loss 1.670 (1.840)  Acc@1 55.47 (54.56) Acc@5 82.03 (80.50)
Test: [ 60/157] Time 0.814 (0.831)  Loss 2.041 (1.843)  Acc@1 53.12 (54.61) Acc@5 75.78 (80.64)
Test: [ 70/157] Time 0.830 (0.829)  Loss 1.848 (1.842)  Acc@1 52.34 (54.63) Acc@5 82.03 (80.61)
Test: [ 80/157] Time 0.825 (0.827)  Loss 1.911 (1.840)  Acc@1 59.38 (54.88) Acc@5 78.12 (80.51)
Test: [ 90/157] Time 0.806 (0.825)  Loss 1.937 (1.835)  Acc@1 55.47 (54.94) Acc@5 78.12 (80.50)
Test: [100/157] Time 0.789 (0.823)  Loss 1.870 (1.842)  Acc@1 52.34 (54.76) Acc@5 82.03 (80.48)
Test: [110/157] Time 0.814 (0.822)  Loss 2.067 (1.850)  Acc@1 50.78 (54.46) Acc@5 71.88 (80.30)
Test: [120/157] Time 0.789 (0.821)  Loss 1.697 (1.851)  Acc@1 56.25 (54.46) Acc@5 82.03 (80.28)
Test: [130/157] Time 0.825 (0.820)  Loss 1.918 (1.853)  Acc@1 52.34 (54.40) Acc@5 79.69 (80.15)
Test: [140/157] Time 0.789 (0.820)  Loss 1.880 (1.849)  Acc@1 57.81 (54.53) Acc@5 78.12 (80.18)
Test: [150/157] Time 0.743 (0.817)  Loss 1.679 (1.851)  Acc@1 57.03 (54.50) Acc@5 85.16 (80.20)
 * Acc@1 54.455 Acc@5 80.245
Accuracy of tuned INT8 model: 54.455

Export INT8 Model to ONNX

if not int8_onnx_path.exists():
    warnings.filterwarnings("ignore", category=TracerWarning)
    warnings.filterwarnings("ignore", category=UserWarning)
    # Export INT8 model to ONNX that is supported by the OpenVINO™ toolkit
    compression_ctrl.export_model(int8_onnx_path)
    print(f"INT8 ONNX model exported to {int8_onnx_path}.")
INT8 ONNX model exported to output/resnet18_int8.onnx.

Convert ONNX models to OpenVINO™ Intermediate Representation (IR)

Call the OpenVINO Model Optimizer tool to convert the ONNX model to OpenVINO IR, with FP16 precision. The models are saved to the current directory. We add the mean values to the model and scale the output with the standard deviation by –mean_values and –scale_values arguments. It is not necessary to normalize input data before propagating it through the network with these options.

See the Model Optimizer Developer Guide for more information about Model Optimizer.

Executing this command may take a while. There may be some errors or warnings in the output. Model Optimizer successfully converted the model to IR if the last lines of the output include: [ SUCCESS ] Generated IR version 10 model

if not fp32_ir_path.exists():
    !mo --input_model $fp32_onnx_path --input_shape "[1,3, $image_size, $image_size]" --mean_values "[123.675, 116.28 , 103.53]" --scale_values "[58.395, 57.12 , 57.375]" --data_type FP16 --output_dir $OUTPUT_DIR
Model Optimizer arguments:
Common parameters:
    - Path to the Input Model:  /home/runner/work/openvino_notebooks/openvino_notebooks/notebooks/302-pytorch-quantization-aware-training/output/resnet18_fp32.onnx
    - Path for generated IR:    /home/runner/work/openvino_notebooks/openvino_notebooks/notebooks/302-pytorch-quantization-aware-training/output
    - IR output name:   resnet18_fp32
    - Log level:    ERROR
    - Batch:    Not specified, inherited from the model
    - Input layers:     Not specified, inherited from the model
    - Output layers:    Not specified, inherited from the model
    - Input shapes:     [1,3, 64, 64]
    - Mean values:  [123.675, 116.28 , 103.53]
    - Scale values:     [58.395, 57.12 , 57.375]
    - Scale factor:     Not specified
    - Precision of IR:  FP16
    - Enable fusing:    True
    - Enable grouped convolutions fusing:   True
    - Move mean values to preprocess section:   None
    - Reverse input channels:   False
ONNX specific parameters:
    - Inference Engine found in:    /opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/openvino
Inference Engine version:   2021.4.2-3976-0943ed67223-refs/pull/539/head
Model Optimizer version:    2021.4.2-3976-0943ed67223-refs/pull/539/head
[ SUCCESS ] Generated IR version 10 model.
[ SUCCESS ] XML file: /home/runner/work/openvino_notebooks/openvino_notebooks/notebooks/302-pytorch-quantization-aware-training/output/resnet18_fp32.xml
[ SUCCESS ] BIN file: /home/runner/work/openvino_notebooks/openvino_notebooks/notebooks/302-pytorch-quantization-aware-training/output/resnet18_fp32.bin
[ SUCCESS ] Total execution time: 6.44 seconds.
[ SUCCESS ] Memory consumed: 212 MB.
if not int8_ir_path.exists():
    !mo --input_model $int8_onnx_path --input_shape "[1,3, $image_size, $image_size]" --mean_values "[123.675, 116.28 , 103.53]" --scale_values "[58.395, 57.12 , 57.375]" --data_type FP16 --output_dir $OUTPUT_DIR
Model Optimizer arguments:
Common parameters:
    - Path to the Input Model:  /home/runner/work/openvino_notebooks/openvino_notebooks/notebooks/302-pytorch-quantization-aware-training/output/resnet18_int8.onnx
    - Path for generated IR:    /home/runner/work/openvino_notebooks/openvino_notebooks/notebooks/302-pytorch-quantization-aware-training/output
    - IR output name:   resnet18_int8
    - Log level:    ERROR
    - Batch:    Not specified, inherited from the model
    - Input layers:     Not specified, inherited from the model
    - Output layers:    Not specified, inherited from the model
    - Input shapes:     [1,3, 64, 64]
    - Mean values:  [123.675, 116.28 , 103.53]
    - Scale values:     [58.395, 57.12 , 57.375]
    - Scale factor:     Not specified
    - Precision of IR:  FP16
    - Enable fusing:    True
    - Enable grouped convolutions fusing:   True
    - Move mean values to preprocess section:   None
    - Reverse input channels:   False
ONNX specific parameters:
    - Inference Engine found in:    /opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/openvino
Inference Engine version:   2021.4.2-3976-0943ed67223-refs/pull/539/head
Model Optimizer version:    2021.4.2-3976-0943ed67223-refs/pull/539/head
[ SUCCESS ] Generated IR version 10 model.
[ SUCCESS ] XML file: /home/runner/work/openvino_notebooks/openvino_notebooks/notebooks/302-pytorch-quantization-aware-training/output/resnet18_int8.xml
[ SUCCESS ] BIN file: /home/runner/work/openvino_notebooks/openvino_notebooks/notebooks/302-pytorch-quantization-aware-training/output/resnet18_int8.bin
[ SUCCESS ] Total execution time: 22.61 seconds.
[ SUCCESS ] Memory consumed: 324 MB.

Benchmark Model Performance by Computing Inference Time

Finally, we will measure the inference performance of the FP32 and INT8 models. To do this, we use Benchmark Tool - OpenVINO’s inference performance measurement tool. By default, Benchmark Tool runs inference for 60 seconds in asynchronous mode on CPU. It returns inference speed as latency (milliseconds per image) and throughput (frames per second) values.

NOTE: In this notebook we run benchmark_app for 15 seconds to give a quick indication of performance. For more accurate performance, we recommended running benchmark_app in a terminal/command prompt after closing other applications. Run benchmark_app -m model.xml -d CPU to benchmark async inference on CPU for one minute. Change CPU to GPU to benchmark on GPU. Run benchmark_app –help to see an overview of all command line options.

def parse_benchmark_output(benchmark_output):
    parsed_output = [line for line in benchmark_output if not (line.startswith(r"[") or line.startswith("  ") or line == "")]
    print(*parsed_output, sep='\n')


print('Benchmark FP32 model (IR)')
benchmark_output = ! benchmark_app -m $fp32_ir_path -d CPU -api async -t 15
parse_benchmark_output(benchmark_output)

print('Benchmark INT8 model (IR)')
benchmark_output = ! benchmark_app -m $int8_ir_path -d CPU -api async -t 15
parse_benchmark_output(benchmark_output)
Benchmark FP32 model (IR)
Count:      2615 iterations
Duration:   15006.80 ms
Latency:    5.43 ms
Throughput: 174.25 FPS
Benchmark INT8 model (IR)
Count:      7029 iterations
Duration:   15003.81 ms
Latency:    1.98 ms
Throughput: 468.48 FPS

Show CPU Information for reference

ie = IECore()
ie.get_metric(device_name="CPU", metric_name="FULL_DEVICE_NAME")
'Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz'