VPU Plugins

This chapter provides information on the Inference Engine plugins that enable inferencing of deep learning models on the supported VPU devices:

Known Layers Limitations

VPU Common Configuration Parameters

The VPU plugins supports the configuration parameters listed below. The parameters are passed as std::map<std::string, std::string> on InferenceEngine::InferencePlugin::LoadNetwork or InferenceEngine::InferencePlugin::SetConfig

Parameter Name Parameter Values Default Description
KEY_VPU_HW_STAGES_OPTIMIZATION YES/NO YES Turn on HW stages usage (applicable for Intel Movidius Myriad X devices only)
KEY_VPU_NETWORK_CONFIG VPU Network Configuration empty string Extra configuration for network compilation and optimization
KEY_VPU_COMPUTE_LAYOUT VPU_AUTO, VPU_NCHW, VPU_NHWC VPU_AUTO Specify internal input and output layouts for network layers
KEY_VPU_LOG_LEVEL LOG_WARNING, LOG_INFO, LOG_DEBUG LOG_NONE Set log level for devices
KEY_VPU_PRINT_RECEIVE_TENSOR_TIME YES/NO NO Add device-side time spent waiting for input to PerformanceCounts
KEY_VPU_INPUT_NORM real number 1.0 Deprecated*
Normalization coefficient for the network input
KEY_VPU_INPUT_BIAS real number 0.0 Deprecated*
Bias value that is added to each element of the network input

*Instead, use Model Optimizer options.

VPU Network Configuration  

The VPU network configuration mechanism allows to override VPU network compiler behavior and tune its optimizations. This mechanism is optional and by default the VPU network compile will use automatic heuristics for network optimizations. The KEY_VPU_NETWORK_CONFIG configuration parameter allows user to specify exact behavior for compiler.

Terminology used for VPU network configuration:

The KEY_VPU_NETWORK_CONFIG parameter is a list of key/value pairs separated by ,:

<key>=<value>,<key>=<value>,<key>=<value>,...

Supported <key> options:

The VPU network compiler threats the configuration as hard requirement and fails if it can't satisfy it.

Network Configuration File

The KEY_VPU_NETWORK_CONFIG parameter allows to use separate file with network configuration. The file is an XML file and must have the following format:

<?xml version="1.0" ?>
<vpu_net_config version="1">
[passes section]
[data section]
[layers section]
[stages section]
</vpu_net_config>

The version attribute specifies the file format version (currently only 1 is supported). Configuration is divided onto sections for passes, data, layers and stages. Each section is optional.

Passes Section

The passes section allows to configure compiler passes. Example of such section:

<passes>
<pass name="passName1">
<enable>true</enable>
</pass>
<pass name="passName2">
<enable>false</enable>
</pass>
</passes>

enable property allows to turn on/off the specified pass.

Available passes:

Data Section  

The data section allows to configure properties for data objects. Example of such section:

<data>
<data name="input">
<scale>64</scale>
</data>
</data>

The data name corresponds to its producer layer from the original IR (the layer that declares this data as output). If the original layer has the only one output, the output data name will be equal to the layer name. If the original layer has more than one output, each output data will have the following name <layer name>.<port id>, where the <port id> corresponds to <port id="3"> XML node in the IR.

scale property allows to apply SCALE factor to specified data object. The SCALE factor is used to increase the data range to avoid floating math errors on HW. The SCALE factor is propagating across the network until its end or until the layer, that can't propagate it.

If the data section is missing in network configuration file, the network compiler will try to estimate such SCALE factor automatically based on layer's weights range. The manual configuration might be used in case if automatic one didn't work or didn't give desired accuracy.

Hint: it is better to use power-of-two values for SCALE factors.

Layers Section

The layers section allows to configure compiler behavior for layers optimization. Per-layer configuration is applied to all stages implementing selected layer. Example of such section:

<layers>
<layer name="conv1">
<hw>
[HW options]
</hw>
</layer>
</layers>

The layer name corresponds to the original IR.

For now layer configuration support only HW section, which controls HW optimizations. The HW section make sense only for Convolution, Pooling and FullyConnected layers.

Layer HW Section

The HW optimization configuration section consists of the following options:

The enable option has the following syntax:

<enable>true</enable>
<enable>false</enable>

By default HW optimization is turned on for all supported layers.

The depth_conv configuration is effective only for Depth Convolution layers (input and output have same number of channels and group parameter is equal to that number) and only when splitDepthConvolution pass is enabled.

The depth_conv option has the following syntax:

<depth_conv>
<split>NONE</split>
<split>SINGLE</split>
<split>COMBINED</split>
<tile_size>`integer value > 0`</tile_size>
<num_tiles>`integer value > 0`</num_tiles>
</depth_conv>

The split parameter controls the split over channels optimization. It can accept the following modes:

The tile_size and num_tiles parameters are optional and allows to manually specify desired tile size for the split. The tile_size specifies exact tile size, while num_tiles specifies the desired number of tiles. Only one of this parameters can be used at a time.

NOTE: for now SINGLE split requires manual tile size/number configuration. COMBINED mode can select tile size automatically.

The tiling allows to choose tile size for HW Convolution and Pooling. Compiler can split the HW layer onto tile along all 3 axis (width, height and channels).

The tiling option has the following syntax:

<tiling>
<input_tile>
<dims>
<dim_w>FULL | AUTO | `integer value > 0`</dim_w>
<dim_h>FULL | AUTO | `integer value > 0`</dim_h>
<dim_c>FULL | AUTO | `integer value > 0`</dim_c>
</dims>
<nums>
<num_w>FULL | AUTO | `integer value > 0`</num_w>
<num_h>FULL | AUTO | `integer value > 0`</num_h>
<num_c>FULL | AUTO | `integer value > 0`</num_c>
</nums>
</input_tile>
<output_tile>
<dims>
<dim_w>FULL | AUTO | `integer value > 0`</dim_w>
<dim_h>FULL | AUTO | `integer value > 0`</dim_h>
<dim_c>FULL | AUTO | `integer value > 0`</dim_c>
</dims>
<nums>
<num_w>FULL | AUTO | `integer value > 0`</num_w>
<num_h>FULL | AUTO | `integer value > 0`</num_h>
<num_c>FULL | AUTO | `integer value > 0`</num_c>
</nums>
</output_tile>
</tiling>

User can specify either input tile or output tile. The compiler will update other tile accordingly. To choose tile size user need to specify either its exact size (dims) or the desired number of tiles (nums). Both dims and nums accepts special values:

If some dimension is missed, AUTO mode is assumed.

The inputs and outputs options controls the layout and location for HW layer inputs and output. They have the following syntax:

<inputs>
<input ind="0">
<copy_child>true | false</copy_child>
<location>AUTO | CMX | DDR</location>
<layout>AUTO | HCW | CHW</layout>
</input>
</inputs>
<outputs>
<output ind="0">
<copy_child>true | false</copy_child>
<location>AUTO | CMX | DDR</location>
<layout>AUTO | HCW | CHW</layout>
</output>
</outputs>

User need to specify which input/output is configured:

Available configuration options:

sw_injections option allows to disable SW stages merge into current layer. The syntax of the sw_injections option:

<sw_injections>
<enable>false</enable>
</sw_injections>

Stages section

The stages section allows to configure compiler behavior for specific stage. Example of such section:

<stages>
<stage name="conv0@HW@soh=0/6+ReLU+Bias">
<hw>
[HW options]
</hw>
</stage>
</stages>

The stage name is created from its base layer from the original IR plus some meta information added by the compiler.

NOTE: the meta information embedded into stage name is subject to change. It is better to use per-layer configuration instead. User can get the exact stages name from GetPerformanceCounts output.

For now stage configuration support only HW section, which controls HW optimizations. The HW section make sense only for Convolution, Pooling and FullyConnected stages.

The HW optimization configuration section consists of the following options:

inputs and outputs options has the same meaning as for per-layer configuration (see Layer HW section below).

sw_injections option for stage has the following syntax:

<sw_injections>
<enable>true | false</enable>
<injected_stages>
<stage>`stage name`</stage>
<stage>`stage name`</stage>
</injected_stages>
</sw_injections>

It allows to disable SW stages merge into current stage or to manually specify which SW stages should be merged into current HW stage.

Network configuration file example

This is an example of network configuration file:

<?xml version="1.0" ?>
<vpu_net_config version="1">
<passes>
<pass name="splitDepthConvolution">
<enable>true</enable>
</pass>
<pass name="tryHCWLayoutForHW">
<enable>true</enable>
</pass>
</passes>
<data>
<data name="input">
<scale>128</scale>
</data>
</data>
<layers>
<layer name="conv5/dw">
<hw>
<depth_conv>
<split>SINGLE</split>
<num_tiles>8</num_tiles>
</depth_conv>
<tiling>
<input_tile>
<dims>
<dim_w>FULL</dim_w>
<dim_h>38</dim_h>
<dim_c>FULL</dim_c>
</dims>
</input_tile>
</tiling>
<inputs>
<input ind="0">
<layout>HCW</layout>
<location>DDR</location>
</input>
</inputs>
<outputs>
<output ind="0">
<layout>CHW</layout>
<location>CMX</location>
</output>
</outputs>
</hw>
</layer>
<layer name="conv5">
<hw>
<tiling>
<input_tile>
<dims>
<dim_w>FULL</dim_w>
<dim_h>12</dim_h>
<dim_c>FULL</dim_c>
</dims>
</input_tile>
</tiling>
<inputs>
<input ind="0">
<layout>CHW</layout>
<location>CMX</location>
</input>
</inputs>
<outputs>
<output ind="0">
<force_copy>true</force_copy>
<layout>HCW</layout>
<location>CMX</location>
</output>
</outputs>
</hw>
</layer>
</layers>
</vpu_net_config>

See Also