Below, a Layer is a CNN building block implemented in the training framework, for example, "Convolution" in the Caffe*. Kernel is defined as the corresponding implementation in Inference Engine.

Please refer to the Model Optimizer documentation on the details of how a mapping between framework's layers and Inference Engine kernels is registered. In short, you can plug your own kernel implementations into the Inference Engine and map them to the layers in the original framework.

The rest of the section covers custom kernels and how do you integrate them into the Inference Engine.

Example of Custom Kernels Support in the Samples

Every sample uses the Inference Engine API to load custom kernels depending on the device type. Specifically, for the CPU this is a shared library that exports certain interface that registers the kernels. For GPU or MYRIAD, it is an .xml file that lists the kernels along with parameters that the kernels accept, and how these map to the specific IR values.

Example Custom Kernels

The extension folder in the samples directory comes with few real example of CPU-targeted kernels, like DetectionOutput (used in SSD*), etc.

Also bunch of the GPU-targeted kernels are added to the binaries upon samples compilation (so that the samples' apps can easy load them). Refer to the cldnn_global_custom_kernels folder in GPU plugin installation directory.

How to Implement Custom GPU Layers

You need to provide the kernel code in the OpenCL C, and the configuration file that connects the kernel and its parameters to the parameters of the layer.

There are two options of using custom layer configuration file.

First is to include section with your kernels into global auto-loading cldnn_global_custom_kernels/cldnn_global_custom_kernels.xml file. Second one is to provide a separate configuration file and load it using IInferencePlugin::SetConfig() method with the PluginConfigParams::KEY_CONFIG_FILE key and the configuration file name as the value, before loading the network that features the custom layers:

// Load clDNN (GPU) plugin

InferenceEngine::InferenceEnginePluginPtr plugin_ptr(selectPlugin({…, “GPU”));
InferencePlugin plugin(plugin_ptr);
// Load clDNN Extensions
plugin.SetConfig({{PluginConfigParams::KEY_CONFIG_FILE, ”<path to the xml file>”}});

For the details on the configuration parameters and OpenCL kernel refer to the tutorial available online

How to Implement Custom MYRIAD Layers

Since OpenCL™ toolchain for MYRIAD supports offline compilation only, OpenCL C code firstly should be compiled using standalone clc compiler with the following command
./clc --strip-binary-header custom_layer.cl -o custom_layer.bin
Write a configuration file with a kernel parameter description and bindings. For example, given the following OpenCL kernel signature:

__kernel void reorg_nhwc(__global const half *src, __global half *out, int w, int h, int c, int stride);

Configuration file for this kernel might be the following:

<CustomLayer name="ReorgYolo" type="MVCL" version="1">

<Kernel entry="reorg_nhwc">

<Source filename="reorg.bin"/>

</Kernel>

<Parameters>

<Tensor arg-name="src" type="input" port-index="0" format="BYXF"/>

<Tensor arg-name="out" type="output" port-index="0" format="BYXF"/>

<Scalar arg-name="w" type="int" port-index="0" source="I.X" />

<Scalar arg-name="h" type="int" port-index="0" source="I.Y" />

<Scalar arg-name="c" type="int" port-index="0" source="I.F" />

<Scalar arg-name="stride" type="int" source="stride" />

</Parameters>

<WorkSizes dim="input,0" global="(Y+7)/8*8,1,1" local="8,1,1"/>

</CustomLayer>

Each custom layer is described with CustomLayer node. Here is the list of mandatory nodes and attributes:
- Root node CustomLayer:
  - Attribute name is the name of the IE layer to bind kernel with.
  - Attributes type and version. Leave them MVCL and 1 for now.
- Sub-node Kernel:
  - Attribute entry is a name of our kernel function as written in source file (reorg_nhwc is in example above).
  - Node Source with attribute filename is a path to compiled binary relative to this xml binding file.
- Sub-node Parameters describes parameters bindings.
- Sub-node WorkSizes describes local, global work group sizes and source for dimension deduction as a pair direction,port. The example above describes work group relative dimension of input tensor that comes thought port 0 in IR. Any simple math expressions with +,-,*,/ and () from B(batch), Y(height), X(width) and F(channels) are supported for global and local work group configuration.
Parameter description format is the following:
- Tensor and Scalar nodes are supported.
- Each Tensor node must contain the following attributes:
  - arg-name, which is a name of kernel parameter in kernel signature
  - type, which is input or output as in IR
  - port-index, which is a number of input/output port as in IR
  - format, which specifies channel order in tensor. Optional repacks would be generated if custom layer format is not compatible with formats of neighboring layers.
- Each Scalar node must contain the following attributes:
  - arg-name, which is a name of kernel parameter in kernel signature.
  - type, which is int or float and used for correct argument extraction from IR parameters.
  - source, which might contain name of the parameter in IR file or input/output (I/O, In/On, where n is a port number) followed by dimension B(batch), Y(height), X(width) or F(channels)
Provide a separate configuration file and load it using InferencePlugin::SetConfig() method with the PluginConfigParams::KEY_CONFIG_FILE key and the configuration file name as the value, before loading the network that features the custom layers:

// Load MYRIAD plugin

InferenceEngine::InferenceEnginePluginPtr plugin_ptr("libmyriadPlugin.so");

InferencePlugin plugin(plugin_ptr);

// Load custom layers

plugin.SetConfig({{PluginConfigParams::KEY_CONFIG_FILE, ”<path to the xml file>”}});

Optionally, you can set path to custom layers description with a pair of VPU_CUSTOM_LAYERS and /path/to/your/customLayers.xml as a network configuration:

// Load MYRIAD plugin

InferenceEngine::InferenceEnginePluginPtr myriad("libmyriadPlugin.so");

std::map<std::string, std::string> networkConfig;

config["VPU_CUSTOM_LAYERS"] = "/path/to/your/customLayers.xml";

// Load custom layers in network config

IECALL(myriad->LoadNetwork(exeNetwork, cnnNetwork, networkConfig, &resp));

NOTE: If both native and custom layer implementations are present custom kernel was a priority over native code.

How to Implement Custom CPU Layers

This is a brief version of the full-blown Custom Layers tutorial available online.

Create your custom layer factory CustomLayerFactory class.

// custom_layer.h

// A CustomLayerFactory class is an example layer which make exponentiation by 2 for the input and doesn't change dimensions

class CustomLayerFactory {

};
Inherit it from the abstract class InferenceEngine::ILayerImplFactory

// custom_layer.h

class CustomLayerFactory: public InferenceEngine::ILayerImplFactory {

};
Create constructor and virtual destructor, and a data member to keep the layer info

// custom_layer.h

class CustomLayerFactory: public InferenceEngine::ILayerImplFactory {

public:

explicit CustomLayerFactory(const CNNLayer *layer): cnnLayer(*layer) {}

private:

CNNLayer cnnLayer;

};
Overload and implement the abstract methods (getShapes, getImplementations) of the InferenceEngine::ILayerImplFactory class

// custom_layer.h

class CustomLayerFactory: public InferenceEngine::ILayerImplFactory {

public:

// ... constructor and destructor

StatusCode getShapes(const std::vector<TensorDesc>& inShapes, std::vector<TensorDesc>& outShapes, ResponseDesc *resp) noexcept override {

if (cnnLayer == nullptr) {

std::string errorMsg = "Cannot get cnn layer!";

errorMsg.copy(resp->msg, sizeof(resp->msg) - 1);

return GENERAL_ERROR;

}

if (inShapes.size() != 1) {

std::string errorMsg = "Incorrect input shapes!";

errorMsg.copy(resp->msg, sizeof(resp->msg) - 1);

return GENERAL_ERROR;

}

outShapes.clear();

outShapes.emplace_back(inShapes[0]);

return OK;

}

StatusCode getImplementations(std::vector<ILayerImpl::Ptr>& impls, ResponseDesc *resp) noexcept override {

// Yoy can put cnnLayer to implimentation if it is necessary.

impls.push_back(ILayerImpl::Ptr(new CustomLayerImpl()));

return OK;

}

};
Create your custom layer implementation CustomLayerImpl class:

// custom_layer.h

// A CustomLayerImpl class is an example implementation

class CustomLayerImpl {

};
Because the layer will use execute method in order to change data, inherit it from the abstract class InferenceEngine::ILayerExecImpl and overload and implement the abstract methods of this class.

// custom_layer.h

// A CustomLayerImpl class is an example implementation

class CustomLayerImpl: public ILayerExecImpl {

public:

explicit CustomLayerImpl(const CNNLayer *layer): cnnLayer(*layer) {}

StatusCode getSupportedConfigurations(std::vector<LayerConfig>& conf, ResponseDesc *resp) noexcept override;

StatusCode init(LayerConfig& config, ResponseDesc *resp) noexcept override;

StatusCode execute(std::vector<Blob::Ptr>& inputs, std::vector<Blob::Ptr>& outputs, ResponseDesc *resp) noexcept override;

private:

CNNLayer cnnLayer;

};
First of all, implement the getSupportedConfigurations, which returns all supported configurations for this implementation. In order to specify formats of data you can use InferenceEngine::TensorDesc. Refer to Inference Engine Memory Primitives for the instructions on how to do this.

// custom_layer.cpp

virtual StatusCode CustomLayerImpl::getSupportedConfigurations(std::vector<LayerConfig>& conf, ResponseDesc *resp) noexcept {

try {

// This layer can be in-place but not constant!!!

if (cnnLayer == nullptr)

THROW_IE_EXCEPTION << "Cannot get cnn layer";

if (cnnLayer->insData.size() != 1 || cnnLayer->outData.empty())

THROW_IE_EXCEPTION << "Incorrecr number of input/outpput edges!";

LayerConfig config;

DataPtr dataPtr = cnnLayer->insData[0].lock();

if (!dataPtr)

THROW_IE_EXCEPTION << "Cannot get input data!";

DataConfig dataConfig;

dataConfig.inPlace = -1;

dataConfig.constant = false;

SizeVector order;

for (size_t i = 0; i < dataPtr->getTensorDesc().getDims().size(); i++) {

order.push_back(i);

}

// Planar formats for N dims

dataConfig.desc = TensorDesc(dataPtr->getTensorDesc().getPrecision(),

dataPtr->getTensorDesc().getDims(),

{dataPtr->getTensorDesc().getDims(), order});

config.inConfs.push_back(dataConfig);

DataConfig outConfig;

outConfig.constant = false;

outConfig.inPlace = 0;

order.clear();

for (size_t i = 0; i < cnnLayer->outData[0]->getTensorDesc().getDims().size(); i++) {

order.push_back(i);

}

outConfig.desc = TensorDesc(cnnLayer->outData[0]->getTensorDesc().getPrecision(),

cnnLayer->outData[0]->getDims(),

{cnnLayer->outData[0]->getDims(), order});

config.outConfs.push_back(outConfig);

config.dynBatchSupport = 0;

conf.push_back(config);

return OK;

} catch (InferenceEngine::details::InferenceEngineException& ex) {

std::string errorMsg = ex.what();

errorMsg.copy(resp->msg, sizeof(resp->msg) - 1);

return GENERAL_ERROR;

}

}
Implement init and execute methods. init required to get selected configuration and check parameters.

// custom_layer.cpp

virtual StatusCode CustomLayerImpl::init(LayerConfig& config, ResponseDesc *resp) noexcept {

StatusCode rc = OK;

if (config.dynBatchSupport) {

config.dynBatchSupport = 0;

rc = NOT_IMPLEMENTED;

}

for (auto& input : config.inConfs) {

if (input.inPlace >= 0) {

input.inPlace = -1;

rc = NOT_IMPLEMENTED;

}

for (auto& offset : input.desc.getBlockingDesc().getOffsetPaddingToData()) {

if (offset) {

return GENERAL_ERROR;

}

}

if (input.desc.getBlockingDesc().getOffsetPadding()) {

return GENERAL_ERROR;

}

for (size_t i = 0; i < input.desc.getBlockingDesc().getOrder().size(); i++) {

if (input.desc.getBlockingDesc().getOrder()[i] != i) {

if (i != 4 || input.desc.getBlockingDesc().getOrder()[i] != 1)

return GENERAL_ERROR;

}

}

}

for (auto& output : config.outConfs) {

if (output.inPlace < 0) {

// NOT in-place

}

for (auto& offset : output.desc.getBlockingDesc().getOffsetPaddingToData()) {

if (offset) {

return GENERAL_ERROR;

}

}

if (output.desc.getBlockingDesc().getOffsetPadding()) {

return GENERAL_ERROR;

}

for (size_t i = 0; i < output.desc.getBlockingDesc().getOrder().size(); i++) {

if (output.desc.getBlockingDesc().getOrder()[i] != i) {

if (i != 4 || output.desc.getBlockingDesc().getOrder()[i] != 1)

return GENERAL_ERROR;

}

}

}

return rc;

}

virtual StatusCode CustomLayerImpl::execute(std::vector<Blob::Ptr>& inputs, std::vector<Blob::Ptr>& outputs, ResponseDesc *resp) noexcept {

if (inputs.size() != 1 || outputs.empty()) {

std::string errorMsg = "Incorrect number of input or output edges!";

errorMsg.copy(resp->msg, sizeof(resp->msg) - 1);

return GENERAL_ERROR;

}

const float* src_data = inputs[0]->buffer();

float* dst_data = outputs[0]->buffer();

for (size_t o = 0; o < outputs->size(); o++) {

if (dst_data == src_data) {

dst_data[o] *= dst_data[o];

} else {

dst_data[o] = src_data[o]*src_data[o];

}

}

}
Create a factory for your own primitives, inherited from the abstract class InferenceEngine::IExtension

// custom_extension.h

class CustomExtention : public InferenceEngine::IExtension {

};
Implement the utility methods Unload, Release, SetLogCallback:

// custom_extension.h

class CustomExtention : public InferenceEngine::IExtension {

public:

// could be used to cleanup resources

void Unload() noexcept override {

}

// is used when destruction happens

void Release() noexcept override {

delete this;

}

// logging is used to track what is going on inside

void SetLogCallback(InferenceEngine::IErrorListener &listener) noexcept override {}

};
Implement the utility method GetVersion:

// custom_extension.h

class CustomExtention : public InferenceEngine::IExtension {

private:

static InferenceEngine::Version ExtensionDescription = {

{1, 0}, // extension API version

"1.0",

"CustomExtention" // extension description message

};

public:

// gets extension version information

void GetVersion(const InferenceEngine::Version *& versionInfo) const noexcept override {

versionInfo = &ExtensionDescription;

}

};
Implement main extension methods:

// custom_extension.h

class CustomExtention : public InferenceEngine::IExtension {

public:

// ... utility methods

StatusCode getPrimitiveTypes(char**& types, unsigned int& size, ResponseDesc* resp) noexcept override {

std::string type_name = "CustomLayer";

types = new char *[1];

size = 1;

types[0] = new char[type_name.size() + 1];

std::copy(type_name.begin(), type_name.end(), types[0]);

types[0][type_name.size()] = '\0';

return OK;

}

StatusCode getFactoryFor(ILayerImplFactory *&factory, const CNNLayer *cnnLayer, ResponseDesc *resp) noexcept override {

if (cnnLayer->type != "CustomLayer") {

std::string errorMsg = std::string("Factory for ") + cnnLayer->type + " wasn't found!";

errorMsg.copy(resp->msg, sizeof(resp->msg) - 1);

return NOT_FOUND;

}

factory = new CustomLayerFactory(cnnLayer);

return OK;

}

};
To use your custom layers, you need to compile the code as the shared library. After that you can use the AddExtension method of the general plugin interface in order to load your primitives:

auto extension_ptr = make_so_pointer<InferenceEngine::IExtension>(“<shared lib path>”);

// Add extension to the plugin’s list

plugin.AddExtension(extension_ptr);

For more details, refer to the sources of a current sample.

Example of Custom Kernels Support in the Samples

Example Custom Kernels

How to Implement Custom GPU Layers

How to Implement Custom MYRIAD Layers

How to Implement Custom CPU Layers

See Also