Quantization Scheme#
Key steps in the quantization scheme:
Low Precision Transformations:
FakeQuantize
decomposition to Quantize with a low precision output and Dequantize. For more details, refer to the Quantize decomposition section.Low Precision Transformations: move Dequantize through operations. For more details, refer to the Main transformations section.
Plugin: fuse operations with Quantize and inference in low precision.
Quantization scheme features:
Quantization operation is expressed through the
FakeQuantize
operation, which involves more than scale and shift. For more details, see: FakeQuantize-1. If theFakeQuantize
input and output intervals are the same,FakeQuantize
degenerates toMultiply
,Subtract
andConvert
(scale & shift).Dequantization operation is expressed through element-wise
Convert
,Subtract
andMultiply
operations.Convert
andSubtract
are optional. These operations can be handled as typical element-wise operations, for example, fused or transformed to another.OpenVINO plugins fuse
Dequantize
andQuantize
operations after a low precision operation and do not fuseQuantize
before it.
Here is a quantization scheme example for int8 quantization applied to a part of a model with two Convolution
operations in CPU plugin.