Step 2. Markup Transformations¶
This step defines the optimal FakeQuantize
decomposition precisions for the best inference performance via operations markup with runtime attribute instances. Attributes are created for input and output ports and operations. Transformations do not change the operation output port precisions. A model markup low precision logic is decomposed and implemented into the following common markup transformations. The order of transformations is important:
Transformation name |
Create attributes |
Use attributes |
---|---|---|
MarkupBias |
Bias |
|
MarkupCanBeQuantized |
Precisions |
|
MarkupPrecisions |
Precisions,PrecisionPreserved |
|
MarkupPerTensorQuantization |
PerTensorQuantization |
|
MarkupAvgPoolPrecisionPreserved |
AvgPoolPrecisionPreserved |
Precisions, PrecisionPreserved |
PropagatePrecisions |
Precisions |
Precisions, PrecisionPreserved |
AlignQuantizationIntervals |
IntervalsAlignment |
PrecisionPreserved |
AlignQuantizationParameters |
QuantizationAlignment |
PrecisionPreserved, PerTensorQuantization |
Note
The same type of attribute instances can be created in different transformations. This approach is the result of the transformation single-responsibility principle. For example, Precision
attribute instances are created in MarkupCanBeQuantized
and MarkupPrecisions
transformations, but the reasons for their creation are different
Common markup transformations can be decomposed into simpler utility markup transformations. The order of Markup utility transformations is not important:
Let’s explore all transformations and their relations in detail, using one and the same model:
The original model key features:
The first
concat1
concatenation operation has not quantizedconvolution1
consumer.The second
concat2
concatenation operation has quantizedconvolution2
consumer with requirements:support
unsigned int8
on activations,per-tensor quantization.
Between the
concat2
concatenation operation andConvolution
there is anAvgPool
operation, which mathematically should return anf32
tensor. But theMarkupAvgPoolPrecisionPreserved
transformation is active. This allows the low precision transformation, that goes after theAvgPool
, to propagate low precision tensor to the next consumer.
Transformations are run with the following parameters:
auto supportedPrecisions = std::vector<PrecisionsRestriction>({
PrecisionsRestriction::create<ov::opset1::Convolution>({
{{0}, {ngraph::element::u8}},
{{1}, {ngraph::element::i8}},
}),
});
auto perTensorQuantization = std::vector<QuantizationGranularityRestriction>({
QuantizationGranularityRestriction::create<ov::opset1::Convolution>({0})
});
ov::pass::Manager lptManager;
lptManager.register_pass<ov::pass::low_precision::LowPrecision>(supportedPrecisions, perTensorQuantization);
lptManager.run_passes(nGraphFunc);
1. MarkupCanBeQuantized¶
The transformation marks operations that cannot be quantized. No attributes are required before the transformation.
Changes in the example model after MarkupCanBeQuantized
transformation:
Not quantized
convolution1
operation is marked by thePrecisions
attribute with empty values. This attribute allows the next transformation to ignore not quantized operation.
Result model:
Model display features (here and below):
The attributes added by the current transformation are marked in bold.
If attributes do not fit into one line, then one line consists of only one attribute.
2. MarkupPrecisions¶
The transformation is required and includes two tasks:
Mark operation input ports (create
Precision
attribute instance) by provided restrictions: input port index and required precisions. Restrictions are provided as input argument inov::pass::low_precision::LowPrecision
constructor.Mark precision preserved operations.
No attributes are required before the transformation. Changes in the example model after MarkupPrecisions
transformation:
Both concatenation operations are marked as precision preserved operations. It allows to propagate precision via these operations.
Quantized
convolution2
operation is marked by thePrecisions
attribute withu8
precision on activations andi8
precisions on weights according to the provided restrictions. This attribute instance allows to specify which precisions are required for quantizedConvolution
operation.
Result model:
3. MarkupPerTensorQuantization¶
The transformation is required and marks operations (create PerTensorQuantization
attribute instance) by provided restrictions: an operation that requires per-tensor quantization. No attributes are required before the transformation.
Changes in the example model after MarkupPerTensorQuantization
transformation:
both
Convolution
operations are marked byPerTensorQuantization
Result model:
4. MarkupAvgPoolPrecisionPreserved¶
The transformation is optional. MarkupAvgPoolPrecisionPreserved
marks AvgPool
operations as precision preserved or not precision preserved. AvgPool
operation is precision preserved if next not precision preserved operation can be inferred in low precision. In other words, AvgPool
operations become precision preserved operations to speed up model inference. The transformation uses PrecisionPreserved
attributes created before. The transformation is combined and uses:
CreatePrecisionsDependentAttribute
PropagateThroughPrecisionPreserved
UpdateSharedPrecisionPreserved
Changes in the example model after MarkupAvgPoolPrecisionPreserved
transformation:
AvgPool
operations are marked byPrecisionPreserved
andAvgPoolPrecisionPreserved
(not used below).
Result model:
5. PropagatePrecisions¶
The transformation is required. PropagatePrecision
is a key transformation in the markup pipeline, which marks FakeQuantize
output port precisions. The transformation uses PrecisionPreserved
attribute instances created before. The transformation is combined and uses:
CreateAttribute
PropagateThroughPrecisionPreserved
PropagateToInput
Changes in the example model after PropagatePrecisions
transformation:
All precision preserved operations are marked by the
Precisions
attribute instance, which defines the required precision for the operation.FakeQuantize
operation output ports are marked byPrecisions
attribute instances, which define target precision for decomposition. In the sample model,FakeQuantize
operations have signed intervals, but thePrecisions
attributes are initialized byu8
(unsigned int8
) values as the result applied during transformations restrictions forConvolution
operations.
Result model:
Note
AlignQuantizationIntervals
and AlignQuantizationParameters
transformations are required if the model has quantized concatenation operations.
6. AlignQuantizationIntervals¶
The transformation is required for models with the quantized operation. The transformation marks FakeQuantize
operation and precision preserved consumers to combine quantization information from different FakeQuantize
operations for future quantization intervals alignment. The transformation is combined and uses:
CreateAttribute
PropagateThroughPrecisionPreserved
Changes in the example model after AlignQuantizationIntervals
transformation:
All
FakeQuantize
operations and their precision preserved consumers are marked by theIntervalsAlignment
attribute instance.
Result model:
7. AlignQuantizationParameters¶
The transformation is required for models with quantized concatenation operation. The transformation marks FakeQuantize precision preserved consumers to align quantization intervals. The transformation is combined and uses:
CreateAttribute
PropagateThroughPrecisionPreserved
UpdateSharedPrecisionPreserved
Changes in the example model after AlignQuantizationParameters
transformation:
All
FakeQuantize
precision preserved consumers are marked byQuantizationAlignment
attribute instance.convolution1
input ports are marked byPrecisions
attribute instances with empty precisions collection. As a result, theconvolution1
operation was detected as not quantized, and theQuantizationAlignment
attribute default valuefalse
does not change.convolution2
input ports are marked byPrecisions
attribute instances with not empty precisions collection.convolution2
operation was detected as quantized with thePerTensorQuantization
attribute, and theQuantizationAlignment
attribute default value changed totrue
.
Final model: