Step 3. Main Transformations

Main transformations are the majority of low precision transformations. Transformations operate with dequantization operations. Main transformations include:

Let’s explore some main transformations on the example model. Original model:

Original model

Result model after main transformations:

Original model

Changes in the example model after main transformation:

  • All FakeQuantize operations (fakeQuantize1, fakeQuantize2 and fakeQuantize3) were decomposed:

    • original FakeQuantize operations were replaced with new operations with other output intervals and output port precision,

    • dequantization operations.

  • Dequantization operations were moved via precision preserved (concat1 and concat2) and quantized (convolution2) operations.

Note

The left branch (branch #1) does not require per-tensor quantization. As a result, the fakeQuantize1 output interval is [0, 255]. But quantized convolution2 requires per-tensor quantization on the right branch (branch #2). Then all connected FakeQuantize interval operations (fakeQuantize1 and fakeQuantize2) are aligned to have per-tensor quantization after the concatenation (concat2) operation.