The Inference Engine workflow involves the creation of custom kernels and either custom or existing layers.

A Layer is a convolutional neural network (CNN) building block implemented in the training framework, for example, Convolution in Caffe*. A Kernel is defined as the corresponding implementation in Inference Engine.

Please refer to the Custom Layers in the Model Optimizer section for the details of how a mapping between framework layers and Inference Engine kernels is registered.

In short, you can plug your own kernel implementations into the Inference Engine and map them to the layers in the original framework.

The rest of the section covers custom kernels and how do you integrate them into the Inference Engine.

Example of Custom Kernels Support in the Samples

Every sample uses the Inference Engine API to load custom kernels depending on the device type. Specifically, for the CPU, it is a shared library that exports certain interface that registers the kernels. For GPU or MYRIAD, it is an .xml file that lists the kernels along with parameters that the kernels accept and how these map to the specific Intermediate Representation (IR) values.

Example Custom Kernels

You can find the examples of CPU-targeted kernels in the <INSTALL_DIR>/deployment_tools/inference_engine/src/extension directory. You can also use as an example global GPU kernels delivered with the OpenVINO toolkit.

Several GPU-targeted kernels are also added to the binaries upon samples compilation so that the samples application can easy load them. Refer to the cldnn_global_custom_kernels folder in GPU plugin installation directory.

How to Implement Custom GPU Layers

The GPU codepath abstracts many details about OpenCL™. You need to provide the kernel code in the OpenCL C and the configuration file that connects the kernel and its parameters to the parameters of the layer.

There are two options of using custom layer configuration file:

Include a section with your kernels into global automatically-loaded cldnn_global_custom_kernels/cldnn_global_custom_kernels.xml file (hosted in the <INSTALL_DIR> /deployment_tools/inference_engine/bin/intel64/{Debug/Release} folder)
Call the IInferencePlugin::SetConfig() method from your application with the PluginConfigParams::KEY_CONFIG_FILE key and the configuration file name as the value before loading the network that uses custom layers to the plugin:

// Load GPU plugin

InferenceEngine::InferenceEnginePluginPtr plugin_ptr(selectPlugin({…, “GPU”));

InferencePlugin plugin(plugin_ptr);

// Load GPU Extensions

plugin.SetConfig({{PluginConfigParams::KEY_CONFIG_FILE, ”<path_to_the_xml_file>”}});

All Inference Engine samples (except trivial hello_classification) feature a dedicated command-line option -c to load custom kernels. For example, to load custom layers for the classification sample:

$ ./classification_sample -m <path_to_model>/bvlc_alexnet_fp16.xml -i ./validation_set/daily/227x227/apron.bmp -d GPU

-c <absolute_path_to_config>/custom_layer_example.xml

Configuration File Format

The configuration file is expected to follow the .xml file structure with a node of type CustomLayer for every custom layer you provide.

The following definitions will use the notations:

(0/1) Can have 0 or 1 instances of this node/attribute
(1) Must have 1 instance of this node/attribute
(0+) Can have any number of instances of this node/attribute
(1+) Can have 1 or more instances of this node/attribute

CustomLayer Node and Sub-node Structure

CustomLayer node contains the entire configuration for a single custom layer.

Attribute Name	#	Description
`name`	(1)	The name of the layer type to be used. This name should be identical to the type used in the IR.
`type`	(1)	Must be `SimpleGPU`
`version`	(1)	Must be `1`

Sub-nodes: Kernel (1), Buffers (1), CompilerOptions (0+), WorkSizes (0/1)

Kernel Node and Sub-node Structure

Kernel node contains all kernel source code configuration. No kernel node structure exists.

Sub-nodes: Source (1+), Define (0+)

Source Node and Sub-node Structure

Source node points to a single OpenCL source file.

Attribute Name	#	Description
`filename`	(1)	Name of the file containing OpenCL source code. Notice that path is relative to your executable. Multiple source nodes will have their sources concatenated in order.

Sub-nodes: None

Define Node and Sub-node Structure

Define node configures a single #define instruction to be added to the sources during compilation (JIT).

Attribute Name	#	Description
`name`	(1)	The name of the defined JIT. For static constants, this can include the value as well (taken as a string).
`param`	(0/1)	This parameter value will be used as the value of this JIT definition.
`type`	(0/1)	The parameter type. Accepted values: `int`, `float`, and `int[]`, `float[]` for arrays.
`default`	(0/1)	The default value to be used if the specified parameters is missing from the layer in the IR.

Sub-nodes: None

The resulting JIT will be of the form: #define [name] [type] [value/default].

Buffers Node and Sub-node Structure

Buffers node configures all input/output buffers for the OpenCL entry function. No buffers node structure exists.

Sub-nodes: Data (0+), Tensor (1+)

Data Node and Sub-node Structure

Data node configures a single input with static data (for example, weight or biases).

Attribute Name	#	Description
`name`	(1)	Name of a blob attached to a layer in the IR
`arg-index`	(1)	0-based index in the entry function arguments to be bound to

Sub-nodes: None

Tensor Node and Sub-node Structure

Tensor node configures a single input or output tensor.

Attribute Name	#	Description
`arg-index`	(1)	0-based index in the entry function arguments to be bound to.
`type`	(1)	`input` or `output`
`port-index`	(1)	0-based index in the layer’s input/output ports in the IR
`format`	(0/1)	Data layout declaration for the tensor. Accepted values: `BFYX`, `BYXF`, `YXFB`, `FYXB` (also in all lowercase). Default value: `BFYX`

CompilerOptions Node and Sub-node Structure

CompilerOptions node configures the compilation flags for the OpenCL sources.

Attribute Name	#	Description
`options`	(1)	Options string to be passed to the OpenCL compiler

Sub-nodes: None

WorkSizes Node and Sub-node Structure

WorkSizes node configures the global/local work sizes to be used when queuing the OpenCL program for execution.

Attribute Name	#	Description
`global` `local`	(0/1) (0/1)	An array of up to 3 integers (or formulas) for defining the OpenCL work-sizes to be used during execution. The formulas can use the values of the B,F,Y,X dimensions and contain the operators: +,-,/,,% (all evaluated in integer arithmetic). Default value: `global=”BFYX” local=””`
`dim`	(0/1)	A tensor to take the work size from. Accepted values: `input N`, `output`, where `N` is an index of input tensor starting with 0. Default value: `output`

Sub-nodes: None

Example Configuration file

The following code sample provides an example configuration file (in .xml format). For information on configuration file structure, see Configuration File Format.

<CustomLayer name="ReLU" type="SimpleGPU" version="1">
  <Kernel entry="example_relu_kernel">
    <Source filename="custom_layer_kernel.cl"/>
    <Define name="neg_slope" type="float" param="negative_slope" default="0.0"/>
  </Kernel>
  <Buffers>
    <Tensor arg-index="0" type="input" port-index="0" format="BFYX"/>
    <Tensor arg-index="1" type="output" port-index="0" format="BFYX"/>
  </Buffers>
  <CompilerOptions options="-cl-mad-enable"/>
  <WorkSizes global="X,Y,B*F"/>
</CustomLayer>

Built-In Defines for Custom Layers

The following table includes definitions that will be attached before the user sources, where <TENSOR> is the actual input and output, (for example, INPUT0 or OUTPUT0).

For an example, see Example Kernel.

Name	Value
`NUM_INPUTS`	Number of the input tensors bound to this kernel
`GLOBAL_WORKSIZE`	An array of global work sizes used to execute this kernel
`GLOBAL_WORKSIZE_SIZE`	The size of the `GLOBAL_WORKSIZE` array
`LOCAL_WORKSIZE`	An array of local work sizes used to execute this kernel
`LOCAL_WORKSIZE_SIZE`	The size of the `LOCAL_WORKSIZE` array
`<TENSOR>_DIMS`	An array of the tensor dimension sizes. Always ordered as `BFYX`
`<TENSOR>_DIMS_SIZE`	The size of the `<TENSOR>_DIMS` array.
`<TENSOR>_TYPE`	The data-type of the tensor: `float`, `half` or `char`
`<TENSOR>_FORMAT_`	The format of the tensor, BFYX, BYXF, YXFB , FYXB or ANY. The format will be concatenated to the defined name. You can use the tensor format to define codepaths in your code with `#ifdef/#endif`.
`<TENSOR>_LOWER_PADDING`	An array of padding elements used for the tensor dimensions before they start. Always ordered as BFYX.
`<TENSOR>_ LOWER_PADDING_SIZE`	The size of the `<TENSOR>_LOWER_PADDING` array
`<TENSOR>_UPPER_PADDING`	An array of padding elements used for the tensor dimensions after they end. Always ordered as BFYX.
`<TENSOR>_UPPER_PADDING_SIZE`	The size of the `<TENSOR>_UPPER_PADDING` array
`<TENSOR>_PITCHES`	The number of elements between adjacent elements in each dimension. Always ordered as BFYX.
`<TENSOR>_PITCHES_SIZE`	The size of the `<TENSOR>_PITCHES` array
`<TENSOR>_OFFSET`	The number of elements from the start of the tensor to the first valid element (bypassing the lower padding)

All <TENSOR> values will be automatically defined for every tensor bound to this layer (INPUT0, INPUT1, OUTPUT0, and so on), as shown in the following for example:

#define INPUT0_DIMS_SIZE 4

#define INPUT0_DIMS (int []){ 1,96,55,55, }

Example Kernel

#pragma OPENCL EXTENSION cl_khr_fp16 : enable
__kernel void example_relu_kernel(
    const __global INPUT0_TYPE*  input0,
          __global OUTPUT0_TYPE* output)
{
    const uint idx  = get_global_id(0);
    const uint idy  = get_global_id(1);
    const uint idbf = get_global_id(2);//batches*features, as OpenCL supports 3D nd-ranges only
    const uint feature = idbf%OUTPUT0_DIMS[1];
    const uint batch   = idbf/OUTPUT0_DIMS[1];
    //notice that pitches are in elements, not in bytes!
    const uint in_id  = batch*INPUT0_PITCHES[0] + feature*INPUT0_PITCHES[1]   + idy*INPUT0_PITCHES[2]  + idx*INPUT0_PITCHES[3]  + INPUT0_OFFSET;
    const uint out_id = batch*OUTPUT0_PITCHES[0] + feature*OUTPUT0_PITCHES[1]  + idy*OUTPUT0_PITCHES[2]  + idx*OUTPUT0_PITCHES[3]  + OUTPUT0_OFFSET;
    INPUT0_TYPE value = input0[in_id];
    //neg_slope (which is non-zero for leaky ReLU) is put automatically as #define, refer to the config xml
    output[out_id] = value < 0 ? value * neg_slope : value;
}

NOTE: As described in the previous section, all the things like INPUT0_TYPE are actually defined as OpenCL (pre-)compiler inputs by the Inference Engine for efficiency reasons. See Debugging Tips for information on debugging the results.

Debugging Tips

Dumping the Resulting Kernels. It is recommended to get a dump of the kernel with all of the values set by the Inference Engine (all of the tensors sizes, floating-point, and integer kernel parameters). To get the dump, add a following line to your code that configures the GPU plugin to output the custom kernels:
plugin.SetConfig({{ PluginConfigParams::KEY_DUMP_KERNELS, PluginConfigParams::YES }});
When the Inference Engine compiles the kernels for the specific network, it also outputs the resulting code for the custom kernels. In the directory of your executable, you will find files like clDNN_program0.cl, clDNN_program1.cl. There are as many files as distinct sets of parameters for your custom kernel (different input tensor sizes, and kernel parameters).
Using printf in the OpenCL™ Kernels. To debug the specific values, you can use printf in your kernels. However, you should be careful: for instance, do not output excessively as it would generate too much data. Since the printf output is typical, your output can be truncated to fit the buffer. Also, because of buffering, you actually get an entire buffer of output when the execution ends.
For more information, refer to printf Function.

How to Implement Custom CPU Layers

Since the primary vehicle for the performance of the CPU codepath in the Inference Engine is the Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN), new CPU kernels extend the Inference Engine plugin for the Intel MKL-DNN. Implementing the InferenceEngine::ILayerImplFactory defines a general CPU-side extension. There are no Intel MKL-DNN specifics in the way you need to implement a kernel.

All Inference Engine samples (except trivial hello_classification) feature a dedicated command-line option -l to CPU load custom kernels. Use the following command-line code to execute the Classification Sample with custom CPU kernels:

$ ./classification_sample -m <path_to_model>/CustomAlexNet.xml -i <path_to_image>/inputImage.bmp -d CPU

-l <absolute_path_to_library>/libmy_sample_extension.so

Consider simple CustomLayerFactory class that registers example kernels which make multiplication by two of its input data, but and does not change the dimensions:

Create your custom layer factory CustomLayerFactory class:

// custom_layer.h

// A CustomLayerFactory class is an example layer which make exponentiation by 2 for the input and doesn't change dimensions

class CustomLayerFactory {

};
Inherit it from the abstract class: InferenceEngine::ILayerImplFactory

// custom_layer.h

class CustomLayerFactory: public InferenceEngine::ILayerImplFactory {

};
Create a constructor, a virtual destructor, and a data member to keep the layer info:

// custom_layer.h

class CustomLayerFactory: public InferenceEngine::ILayerImplFactory {

public:

explicit CustomLayerFactory(const CNNLayer *layer): cnnLayer(*layer) {}

private:

CNNLayer cnnLayer;

};
Overload and implement the abstract methods getShapes and getImplementations of the InferenceEngine::ILayerImplFactory class:

// custom_layer.h

class CustomLayerFactory: public InferenceEngine::ILayerImplFactory {

public:

// ... constructor and destructor

StatusCode getShapes(const std::vector<TensorDesc>& inShapes, std::vector<TensorDesc>& outShapes, ResponseDesc *resp) noexcept override {

if (cnnLayer == nullptr) {

std::string errorMsg = "Cannot get cnn layer!";

errorMsg.copy(resp->msg, sizeof(resp->msg) - 1);

return GENERAL_ERROR;

}

if (inShapes.size() != 1) {

std::string errorMsg = "Incorrect input shapes!";

errorMsg.copy(resp->msg, sizeof(resp->msg) - 1);

return GENERAL_ERROR;

}

outShapes.clear();

outShapes.emplace_back(inShapes[0]);

return OK;

}

StatusCode getImplementations(std::vector<ILayerImpl::Ptr>& impls, ResponseDesc *resp) noexcept override {

// You can add cnnLayer to implementation if it is necessary

impls.push_back(ILayerImpl::Ptr(new CustomLayerImpl()));

return OK;

}

};
Create your custom layer implementation CustomLayerImpl class:

// custom_layer.h

// A CustomLayerImpl class is an example implementation

class CustomLayerImpl {

};
Because the layer uses execute method to change data, inherit it from the abstract class InferenceEngine::ILayerExecImpl, overload and implement the abstract methods of this class:

// custom_layer.h

// A CustomLayerImpl class is an example implementation

class CustomLayerImpl: public ILayerExecImpl {

public:

explicit CustomLayerImpl(const CNNLayer *layer): cnnLayer(*layer) {}

StatusCode getSupportedConfigurations(std::vector<LayerConfig>& conf, ResponseDesc *resp) noexcept override;

StatusCode init(LayerConfig& config, ResponseDesc *resp) noexcept override;

StatusCode execute(std::vector<Blob::Ptr>& inputs, std::vector<Blob::Ptr>& outputs, ResponseDesc *resp) noexcept override;

private:

CNNLayer cnnLayer;

};
Implement the getSupportedConfigurations virtual method, which returns all supported configuration formats (input/output tensor layouts) for your implementation. To specify formats of data, use InferenceEngine::TensorDesc. Refer to the Memory Primitives section for instructions on how to do it.

// custom_layer.cpp

virtual StatusCode CustomLayerImpl::getSupportedConfigurations(std::vector<LayerConfig>& conf, ResponseDesc *resp) noexcept {

try {

// This layer can be in-place but not constant

if (cnnLayer == nullptr)

THROW_IE_EXCEPTION << "Cannot get CNN layer";

if (cnnLayer->insData.size() != 1 || cnnLayer->outData.empty())

THROW_IE_EXCEPTION << "Incorrect number of input/output edges";

LayerConfig config;

DataPtr dataPtr = cnnLayer->insData[0].lock();

if (!dataPtr)

THROW_IE_EXCEPTION << "Cannot get input data";

DataConfig dataConfig;

dataConfig.inPlace = -1;

dataConfig.constant = false;

SizeVector order;

for (size_t i = 0; i < dataPtr->getTensorDesc().getDims().size(); i++) {

order.push_back(i);

}

// Planar formats for N dimensions

dataConfig.desc = TensorDesc(dataPtr->getTensorDesc().getPrecision(),

dataPtr->getTensorDesc().getDims(),

{dataPtr->getTensorDesc().getDims(), order});

config.inConfs.push_back(dataConfig);

DataConfig outConfig;

outConfig.constant = false;

outConfig.inPlace = 0;

order.clear();

for (size_t i = 0; i < cnnLayer->outData[0]->getTensorDesc().getDims().size(); i++) {

order.push_back(i);

}

outConfig.desc = TensorDesc(cnnLayer->outData[0]->getTensorDesc().getPrecision(),

cnnLayer->outData[0]->getDims(),

{cnnLayer->outData[0]->getDims(), order});

config.outConfs.push_back(outConfig);

config.dynBatchSupport = 0;

conf.push_back(config);

return OK;

} catch (InferenceEngine::details::InferenceEngineException& ex) {

std::string errorMsg = ex.what();

errorMsg.copy(resp->msg, sizeof(resp->msg) - 1);

return GENERAL_ERROR;

}

}
Implement the init method to get a runtime-selected configuration from a vector that populated in the previous step and check the parameters:

// custom_layer.cpp

virtual StatusCode CustomLayerImpl::init(LayerConfig& config, ResponseDesc *resp) noexcept {

StatusCode rc = OK;

if (config.dynBatchSupport) {

config.dynBatchSupport = 0;

rc = NOT_IMPLEMENTED;

}

for (auto& input : config.inConfs) {

if (input.inPlace >= 0) {

input.inPlace = -1;

rc = NOT_IMPLEMENTED;

}

for (auto& offset : input.desc.getBlockingDesc().getOffsetPaddingToData()) {

if (offset) {

return GENERAL_ERROR; // our simplified implementation does not support data offsets

}

}

if (input.desc.getBlockingDesc().getOffsetPadding()) {

return GENERAL_ERROR; // our simplified implementation does not support padding

}

for (size_t i = 0; i < input.desc.getBlockingDesc().getOrder().size(); i++) {

if (input.desc.getBlockingDesc().getOrder()[i] != i) {

// our simplified tensors support only 4D dimensions with regular order

if (i != 4 || input.desc.getBlockingDesc().getOrder()[i] != 1)

return GENERAL_ERROR;

}

}

}

for (auto& output : config.outConfs) {

if (output.inPlace < 0) {

// no in-place support for the output

return GENERAL_ERROR;

}

for (auto& offset : output.desc.getBlockingDesc().getOffsetPaddingToData()) {

if (offset) {

return GENERAL_ERROR;

}

}

if (output.desc.getBlockingDesc().getOffsetPadding()) {

return GENERAL_ERROR;

}

for (size_t i = 0; i < output.desc.getBlockingDesc().getOrder().size(); i++) {

if (output.desc.getBlockingDesc().getOrder()[i] != i) {

if (i != 4 || output.desc.getBlockingDesc().getOrder()[i] != 1)

return GENERAL_ERROR;

}

}

}

return rc;

}
Implement the execute method, which accepts and processes the actual tenors as input/output blobs:

// custom_layer.cpp

virtual StatusCode CustomLayerImpl::execute(std::vector<Blob::Ptr>& inputs, std::vector<Blob::Ptr>& outputs, ResponseDesc *resp) noexcept {

if (inputs.size() != 1 || outputs.empty()) {

std::string errorMsg = "Incorrect number of input or output edges!";

errorMsg.copy(resp->msg, sizeof(resp->msg) - 1);

return GENERAL_ERROR;

}

const float* src_data = inputs[0]->buffer();

float* dst_data = outputs[0]->buffer();

for (size_t o = 0; o < outputs->size(); o++) {

if (dst_data == src_data) {

dst_data[o] *= dst_data[o];

} else {

dst_data[o] = src_data[o]*src_data[o];

}

}

}
Pack the kernels into a shared library:
1. Create a factory for your own primitives inherited from the abstract class InferenceEngine::IExtension, which defines the functions that you need to implement:
  
  // custom_extension.h
  
  class CustomExtention : public InferenceEngine::IExtension {
  
  };
2. Implement the utility methods Unload, Release, SetLogCallback:
  
  // custom_extension.h
  
  class CustomExtention : public InferenceEngine::IExtension {
  
  public:
  
  // cleans up resources, in this case, does nothing
  
  void Unload() noexcept override {
  
  }
  
  // is used when destruction happens
  
  void Release() noexcept override {
  
  delete this;
  
  }
  
  // logging is used to track what is going on inside
  
  void SetLogCallback(InferenceEngine::IErrorListener &listener) noexcept override {}
  
  };
3. Implement the utility method GetVersion:
  
  // custom_extension.h
  
  class CustomExtention : public InferenceEngine::IExtension {
  
  private:
  
  static InferenceEngine::Version ExtensionDescription = {
  
  {1, 0}, // extension API version
  
  "1.0",
  
  "CustomExtention" // extension description message
  
  };
  
  public:
  
  // gets extension version information
  
  void GetVersion(const InferenceEngine::Version *& versionInfo) const noexcept override {
  
  versionInfo = &ExtensionDescription;
  
  }
  
  };
4. Implement main extension methods:
  
  // custom_extension.h
  
  class CustomExtention : public InferenceEngine::IExtension {
  
  public:
  
  // ... utility methods
  
  // retrunes the list of supported kernels/layers
  
  StatusCode getPrimitiveTypes(char**& types, unsigned int& size, ResponseDesc* resp) noexcept override {
  
  std::string type_name = "CustomLayer";
  
  types = new char *[1];
  
  size = 1;
  
  types[0] = new char[type_name.size() + 1];
  
  std::copy(type_name.begin(), type_name.end(), types[0]);
  
  types[0][type_name.size()] = '\0';
  
  return OK;
  
  }
  
  // main function
  
  StatusCode getFactoryFor(ILayerImplFactory *&factory, const CNNLayer *cnnLayer, ResponseDesc *resp) noexcept override {
  
  if (cnnLayer->type != "CustomLayer") {
  
  std::string errorMsg = std::string("Factory for ") + cnnLayer->type + " wasn't found!";
  
  errorMsg.copy(resp->msg, sizeof(resp->msg) - 1);
  
  return NOT_FOUND;
  
  }
  
  factory = new CustomLayerFactory(cnnLayer);
  
  return OK;
  
  }
  
  };
To use your custom layers, you need to compile the code as the shared library. After that use the AddExtension method of the general plugin interface to load your primitives:

// Load the CPU plugin

InferenceEngine::InferenceEnginePluginPtr plugin_ptr(selectPlugin(…, “CPU”));

InferencePlugin plugin(plugin_ptr);

// Load CPU extension as a shared library

auto extension_ptr = make_so_pointer<InferenceEngine::IExtension>(“<shared lib path>”);

// Add extension to the plugin list

plugin.AddExtension(extension_ptr);

How to Implement Custom MYRIAD Layers

Since OpenCL toolchain for MYRIAD supports offline compilation only, you should first compile OpenCL C code using standalone clc compiler with the following command:
./clc --strip-binary-header custom_layer.cl -o custom_layer.bin
Write a configuration file with a kernel parameter description and bindings. For example, given the following OpenCL kernel signature:

__kernel void reorg_nhwc(__global const half *src, __global half *out, int w, int h, int c, int stride);

Configuration file for this kernel might be the following:

<CustomLayer name="ReorgYolo" type="MVCL" version="1">

<Kernel entry="reorg_nhwc">

<Source filename="reorg.bin"/>

</Kernel>

<Parameters>

<Tensor arg-name="src" type="input" port-index="0" format="BYXF"/>

<Tensor arg-name="out" type="output" port-index="0" format="BYXF"/>

<Scalar arg-name="w" type="int" port-index="0" source="I.X" />

<Scalar arg-name="h" type="int" port-index="0" source="I.Y" />

<Scalar arg-name="c" type="int" port-index="0" source="I.F" />

<Scalar arg-name="stride" type="int" source="stride" />

</Parameters>

<WorkSizes dim="input,0" global="(Y+7)/8*8,1,1" local="8,1,1"/>

</CustomLayer>

Each custom layer is described with CustomLayer node. It has the following required nodes and attributes:
- Root node CustomLayer must contain the following attributes:
  - name, which is the name of the Inference Engine layer to bind kernel with
  - type and version. Set them to MVCL and 1.
- Sub-node Kernel must contain the following attributes:
  - entry, which is a name of your kernel function as you defined it in a source file (in the example above, it is reorg_nhwc)
  - Node Source must contain the following attributes:
    - filename, which is a path to a compiled binary relative to the .xml binding file
- Sub-node Parameters, which describes parameters bindings
- Sub-node WorkSizes, which describes local, global work group sizes and source for dimension deduction as a pair direction,port. In the example above, the work group is described relatively to the dimension of input tensor that comes thought port 0 in IR. global and local work group configurations support any simple math expressions with +,-,*,/, and () from B(batch), Y(height), X(width) and F(channels).
Parameter description supports Tensor and Scalar nodes and has the following format:
- Each Tensor node must contain the following attribute:
  - arg-name, which is a name of a kernel parameter in the kernel signature
  - type, which is input or output as in the IR
  - port-index, which is a number of input/output port as in the IR
  - format, which specifies channel order in tensor. Optional repacks are generated if custom layer format is not compatible with formats of neighboring layers.
- Each Scalar node must contain the following attributes:
  - arg-name, which is a name of a kernel parameter in the kernel signature
  - type, which is int or float. It is used for correct argument extraction from IR parameters.
  - source, which contains the name of the parameter in IR file or input/output (I/O, In/On, where n is a port number) followed by dimension B(batch), Y(height), X(width) or F(channels).
Before loading the network that features the custom layers, provide a separate configuration file and load it using IInferencePlugin::SetConfig() method with the PluginConfigParams::KEY_CONFIG_FILE key and the configuration file name as the value:

// Load MYRIAD plugin

InferenceEngine::InferenceEnginePluginPtr plugin_ptr("libmyriadPlugin.so");

InferencePlugin plugin(plugin_ptr);

// Load custom layers

plugin.SetConfig({{PluginConfigParams::KEY_CONFIG_FILE, ”<path to the xml file>”}});

Optionally, you can set path to custom layers description with a pair of VPU_CUSTOM_LAYERS and /path/to/your/customLayers.xml as a network configuration:

// Load MYRIAD plugin

InferenceEngine::InferenceEnginePluginPtr myriad("libmyriadPlugin.so");

std::map<std::string, std::string> networkConfig;

config["VPU_CUSTOM_LAYERS"] = "/path/to/your/customLayers.xml";

// Load custom layers in network config

IECALL(myriad->LoadNetwork(exeNetwork, cnnNetwork, networkConfig, &resp));

NOTE: If both native and custom layer implementations are present, custom kernel has a priority over native code.

Example of Custom Kernels Support in the Samples

Example Custom Kernels

How to Implement Custom GPU Layers

Configuration File Format

Example Configuration file

Built-In Defines for Custom Layers

Example Kernel

Debugging Tips

How to Implement Custom CPU Layers

How to Implement Custom MYRIAD Layers

See Also