This chapter provides information on the Inference Engine plugins that enable inferencing of deep learning models on the supported VPU devices:
'ScaleShift'
layer is supported for zero value of 'broadcast'
attribute only'Bias'
works for inputs with equal dimensions'CTCGreedyDecoder'
works with 'ctc_merge_repeated'
attribute equal 1'DetectionOutput'
works with zero values of 'interpolate_orientation'
and 'num_orient_classes'
parameters only'MVN'
uses fixed value for 'eps'
parameters (1e-9)'LRN'
is supported for 'region'
params equal 'across'
'Normalize'
uses fixed value for 'eps'
parameters (1e-9) and is supported for zero value of 'across_spatial'
only.The VPU plugins supports the configuration parameters listed below. The parameters are passed as std::map<std::string, std::string>
on InferenceEngine::InferencePlugin::LoadNetwork
or InferenceEngine::InferencePlugin::SetConfig
Parameter Name | Parameter Values | Default | Description |
---|---|---|---|
KEY_VPU_HW_STAGES_OPTIMIZATION |
YES /NO |
YES |
Turn on HW stages usage (applicable for Intel Movidius Myriad X devices only) |
KEY_VPU_NETWORK_CONFIG |
VPU Network Configuration | empty string | Extra configuration for network compilation and optimization |
KEY_VPU_COMPUTE_LAYOUT |
VPU_AUTO , VPU_NCHW , VPU_NHWC |
VPU_AUTO |
Specify internal input and output layouts for network layers |
KEY_VPU_LOG_LEVEL |
LOG_WARNING , LOG_INFO , LOG_DEBUG |
LOG_NONE |
Set log level for devices |
KEY_VPU_PRINT_RECEIVE_TENSOR_TIME |
YES /NO |
NO |
Add device-side time spent waiting for input to PerformanceCounts |
KEY_VPU_INPUT_NORM |
real number | 1.0 |
Deprecated* Normalization coefficient for the network input |
KEY_VPU_INPUT_BIAS |
real number | 0.0 |
Deprecated* Bias value that is added to each element of the network input |
*Instead, use Model Optimizer options.
The VPU network configuration mechanism allows to override VPU network compiler behavior and tune its optimizations. This mechanism is optional and by default the VPU network compile will use automatic heuristics for network optimizations. The KEY_VPU_NETWORK_CONFIG
configuration parameter allows user to specify exact behavior for compiler.
Terminology used for VPU network configuration:
The KEY_VPU_NETWORK_CONFIG
parameter is a list of key/value pairs separated by ,
:
Supported <key>
options:
file
- <value>
is path to XML file with configuration, the format of the file is described below.data
- <value>
is a name of Data object, next options are applied to this Data:scale
- <value>
is a SCALE factor. See Data section.The VPU network compiler threats the configuration as hard requirement and fails if it can't satisfy it.
The KEY_VPU_NETWORK_CONFIG
parameter allows to use separate file with network configuration. The file is an XML file and must have the following format:
The version
attribute specifies the file format version (currently only 1
is supported). Configuration is divided onto sections for passes, data, layers and stages. Each section is optional.
The passes section allows to configure compiler passes. Example of such section:
enable
property allows to turn on/off the specified pass.
Available passes:
packPostOps
- merges ReLU with Bias, makes them in-place, turned on by default.eliminateReshapeStages
- tries to eliminate reshape operations and make them in-place, turned on by default.swapConcatAndPool
- tries to replace Convolution->Concat->Pooling
pattern with Convolution->Pooling->Concat
, turned on by default.splitLargeConvolution
- tries to split large Convolution onto tiles along output channels, turned on by default.splitDepthConvolution
- tries to split Depth Convolution onto tiles and replace them with HW analog, turned off by default.eliminateCopyStages
- tries to eliminate extra Copy stages, turned on by default.tryHCWLayoutForHW
- tries to apply HCW layout between HW stages, turned off by default.injectSwOps
- tries to merge HW stages with independent SW stages to execute them in parallel, turned on by default.The data section allows to configure properties for data objects. Example of such section:
The data name corresponds to its producer layer from the original IR (the layer that declares this data as output). If the original layer has the only one output, the output data name will be equal to the layer name. If the original layer has more than one output, each output data will have the following name <layer name>.<port id>
, where the <port id>
corresponds to <port id="3">
XML node in the IR.
scale
property allows to apply SCALE factor to specified data object. The SCALE factor is used to increase the data range to avoid floating math errors on HW. The SCALE factor is propagating across the network until its end or until the layer, that can't propagate it.
If the data section is missing in network configuration file, the network compiler will try to estimate such SCALE factor automatically based on layer's weights range. The manual configuration might be used in case if automatic one didn't work or didn't give desired accuracy.
Hint: it is better to use power-of-two values for SCALE factors.
The layers section allows to configure compiler behavior for layers optimization. Per-layer configuration is applied to all stages implementing selected layer. Example of such section:
The layer name corresponds to the original IR.
For now layer
configuration support only HW section, which controls HW optimizations. The HW section make sense only for Convolution, Pooling and FullyConnected layers.
The HW optimization configuration section consists of the following options:
enable
- turns on/off HW optimization of the selected layer.depth_conv
- controls HW optimization of Depth Convolution.tiling
- controls HW tiling behavior.inputs
and outputs
- control behavior for layer input and output data.sw_injections
- control behavior of HW+SW stages merge optimization.The enable
option has the following syntax:
By default HW optimization is turned on for all supported layers.
The depth_conv
configuration is effective only for Depth Convolution layers (input and output have same number of channels and group
parameter is equal to that number) and only when splitDepthConvolution
pass is enabled.
The depth_conv
option has the following syntax:
The split
parameter controls the split over channels optimization. It can accept the following modes:
NONE
- no split over channels, the depth convolution will be executed as single HW convolution.SINGLE
- the depth convolution will be split over channels, each tile will be executed as single HW convolution.COMBINED
- the compiler will split the current depth convolution along with its predecessor convolution over channels.The tile_size
and num_tiles
parameters are optional and allows to manually specify desired tile size for the split. The tile_size
specifies exact tile size, while num_tiles
specifies the desired number of tiles. Only one of this parameters can be used at a time.
NOTE: for now
SINGLE
split requires manual tile size/number configuration.COMBINED
mode can select tile size automatically.
The tiling
allows to choose tile size for HW Convolution and Pooling. Compiler can split the HW layer onto tile along all 3 axis (width, height and channels).
The tiling
option has the following syntax:
User can specify either input tile or output tile. The compiler will update other tile accordingly. To choose tile size user need to specify either its exact size (dims
) or the desired number of tiles (nums
). Both dims
and nums
accepts special values:
FULL
- tile size is equal to the input/output size on selected axis (ie. no tiling for this axis).AUTO
- lets the compiler choose tile size for selected axis automatically.If some dimension is missed, AUTO
mode is assumed.
The inputs
and outputs
options controls the layout and location for HW layer inputs and output. They have the following syntax:
User need to specify which input/output is configured:
<input ind="0">
- layer input.<input ind="1">
- layer weights (not applicable for Pooling).<input ind="2">
- layer biases.<output ind="0">
- layer output.Available configuration options:
copy_child
- force compiler to insert Copy stage for selected input/output before/after the current layer.location
- sets the desired location of the input/output.layout
- sets the desired layout of the input/output (make sense only for ind="0"
input and output).sw_injections
option allows to disable SW stages merge into current layer. The syntax of the sw_injections
option:
The stages section allows to configure compiler behavior for specific stage. Example of such section:
The stage name is created from its base layer from the original IR plus some meta information added by the compiler.
NOTE: the meta information embedded into stage name is subject to change. It is better to use per-layer configuration instead. User can get the exact stages name from
GetPerformanceCounts
output.
For now stage
configuration support only HW section, which controls HW optimizations. The HW section make sense only for Convolution, Pooling and FullyConnected stages.
The HW optimization configuration section consists of the following options:
inputs
and outputs
- control behavior for stage input and output data.sw_injections
- control behavior of HW+SW stages merge optimization.inputs
and outputs
options has the same meaning as for per-layer configuration (see Layer HW section below).
sw_injections
option for stage
has the following syntax:
It allows to disable SW stages merge into current stage or to manually specify which SW stages should be merged into current HW stage.
This is an example of network configuration file: