Step 2. Markup Transformations#
This step defines the optimal FakeQuantize decomposition precisions for the best inference performance via operations markup with runtime attribute instances. Attributes are created for input and output ports and operations. Transformations do not change the operation output port precisions. A model markup low precision logic is decomposed and implemented into the following common markup transformations. The order of transformations is important:
Transformation name |
Create attributes |
Use attributes |
|---|---|---|
MarkupBias |
Bias |
|
MarkupCanBeQuantized |
Precisions |
|
MarkupPrecisions |
Precisions,PrecisionPreserved |
|
MarkupPerTensorQuantization |
PerTensorQuantization |
|
MarkupAvgPoolPrecisionPreserved |
AvgPoolPrecisionPreserved |
Precisions, PrecisionPreserved |
PropagatePrecisions |
Precisions |
Precisions, PrecisionPreserved |
AlignQuantizationIntervals |
IntervalsAlignment |
PrecisionPreserved |
AlignQuantizationParameters |
QuantizationAlignment |
PrecisionPreserved, PerTensorQuantization |
Note
The same type of attribute instances can be created in different transformations. This approach is the result of the transformation single-responsibility principle. For example, Precision attribute instances are created in MarkupCanBeQuantized and MarkupPrecisions transformations, but the reasons for their creation are different
Common markup transformations can be decomposed into simpler utility markup transformations. The order of Markup utility transformations is not important:
Let’s explore all transformations and their relations in detail, using one and the same model:
The original model key features:
The first
concat1concatenation operation has not quantizedconvolution1consumer.The second
concat2concatenation operation has quantizedconvolution2consumer with requirements:support
unsigned int8on activations,per-tensor quantization.
Between the
concat2concatenation operation andConvolutionthere is anAvgPooloperation, which mathematically should return anf32tensor. But theMarkupAvgPoolPrecisionPreservedtransformation is active. This allows the low precision transformation, that goes after theAvgPool, to propagate low precision tensor to the next consumer.
Transformations are run with the following parameters:
auto supportedPrecisions = std::vector<PrecisionsRestriction>({
PrecisionsRestriction::create<ov::opset1::Convolution>({
{{0}, {ov::element::u8}},
{{1}, {ov::element::i8}},
}),
});
auto perTensorQuantization = std::vector<QuantizationGranularityRestriction>({
QuantizationGranularityRestriction::create<ov::opset1::Convolution>({0})
});
ov::pass::Manager lptManager;
lptManager.register_pass<ov::pass::low_precision::LowPrecision>(supportedPrecisions, perTensorQuantization);
lptManager.run_passes(model);
1. MarkupCanBeQuantized#
The transformation marks operations that cannot be quantized. No attributes are required before the transformation.
Changes in the example model after MarkupCanBeQuantized transformation:
Not quantized
convolution1operation is marked by thePrecisionsattribute with empty values. This attribute allows the next transformation to ignore not quantized operation.
Result model:
Model display features (here and below):
The attributes added by the current transformation are marked in bold.
If attributes do not fit into one line, then one line consists of only one attribute.
2. MarkupPrecisions#
The transformation is required and includes two tasks:
Mark operation input ports (create
Precisionattribute instance) by provided restrictions: input port index and required precisions. Restrictions are provided as input argument inov::pass::low_precision::LowPrecisionconstructor.Mark precision preserved operations.
No attributes are required before the transformation. Changes in the example model after MarkupPrecisions transformation:
Both concatenation operations are marked as precision preserved operations. It allows to propagate precision via these operations.
Quantized
convolution2operation is marked by thePrecisionsattribute withu8precision on activations andi8precisions on weights according to the provided restrictions. This attribute instance allows to specify which precisions are required for quantizedConvolutionoperation.
Result model:
3. MarkupPerTensorQuantization#
The transformation is required and marks operations (create PerTensorQuantization attribute instance) by provided restrictions: an operation that requires per-tensor quantization. No attributes are required before the transformation.
Changes in the example model after MarkupPerTensorQuantization transformation:
both
Convolutionoperations are marked byPerTensorQuantization
Result model:
4. MarkupAvgPoolPrecisionPreserved#
The transformation is optional. MarkupAvgPoolPrecisionPreserved marks AvgPool operations as precision preserved or not precision preserved. AvgPool operation is precision preserved if next not precision preserved operation can be inferred in low precision. In other words, AvgPool operations become precision preserved operations to speed up model inference. The transformation uses PrecisionPreserved attributes created before. The transformation is combined and uses:
CreatePrecisionsDependentAttribute
PropagateThroughPrecisionPreserved
UpdateSharedPrecisionPreserved
Changes in the example model after MarkupAvgPoolPrecisionPreserved transformation:
AvgPooloperations are marked byPrecisionPreservedandAvgPoolPrecisionPreserved(not used below).
Result model:
5. PropagatePrecisions#
The transformation is required. PropagatePrecision is a key transformation in the markup pipeline, which marks FakeQuantize output port precisions. The transformation uses PrecisionPreserved attribute instances created before. The transformation is combined and uses:
CreateAttribute
PropagateThroughPrecisionPreserved
PropagateToInput
Changes in the example model after PropagatePrecisions transformation:
All precision preserved operations are marked by the
Precisionsattribute instance, which defines the required precision for the operation.FakeQuantizeoperation output ports are marked byPrecisionsattribute instances, which define target precision for decomposition. In the sample model,FakeQuantizeoperations have signed intervals, but thePrecisionsattributes are initialized byu8(unsigned int8) values as the result applied during transformations restrictions forConvolutionoperations.
Result model:
Note
AlignQuantizationIntervals and AlignQuantizationParameters transformations are required if the model has quantized concatenation operations.
6. AlignQuantizationIntervals#
The transformation is required for models with the quantized operation. The transformation marks FakeQuantize operation and precision preserved consumers to combine quantization information from different FakeQuantize operations for future quantization intervals alignment. The transformation is combined and uses:
CreateAttribute
PropagateThroughPrecisionPreserved
Changes in the example model after AlignQuantizationIntervals transformation:
All
FakeQuantizeoperations and their precision preserved consumers are marked by theIntervalsAlignmentattribute instance.
Result model:
7. AlignQuantizationParameters#
The transformation is required for models with quantized concatenation operation. The transformation marks FakeQuantize precision preserved consumers to align quantization intervals. The transformation is combined and uses:
CreateAttribute
PropagateThroughPrecisionPreserved
UpdateSharedPrecisionPreserved
Changes in the example model after AlignQuantizationParameters transformation:
All
FakeQuantizeprecision preserved consumers are marked byQuantizationAlignmentattribute instance.convolution1input ports are marked byPrecisionsattribute instances with empty precisions collection. As a result, theconvolution1operation was detected as not quantized, and theQuantizationAlignmentattribute default valuefalsedoes not change.convolution2input ports are marked byPrecisionsattribute instances with not empty precisions collection.convolution2operation was detected as quantized with thePerTensorQuantizationattribute, and theQuantizationAlignmentattribute default value changed totrue.
Final model: