Quantization Scheme#

Key steps in the quantization scheme:

Low Precision Transformations: FakeQuantize decomposition to Quantize with a low precision output and Dequantize. For more details, refer to the Quantize decomposition section.
Low Precision Transformations: move Dequantize through operations. For more details, refer to the Main transformations section.
Plugin: fuse operations with Quantize and inference in low precision.

Quantization scheme features:

Quantization operation is expressed through the FakeQuantize operation, which involves more than scale and shift. For more details, see: FakeQuantize-1. If the FakeQuantize input and output intervals are the same, FakeQuantize degenerates to Multiply, Subtract and Convert (scale & shift).
Dequantization operation is expressed through element-wise Convert, Subtract and Multiply operations. Convert and Subtract are optional. These operations can be handled as typical element-wise operations, for example, fused or transformed to another.
OpenVINO plugins fuse Dequantize and Quantize operations after a low precision operation and do not fuse Quantize before it.

Here is a quantization scheme example for int8 quantization applied to a part of a model with two Convolution operations in CPU plugin.