OpenVINO Extensibility Mechanism

The Intel® Distribution of OpenVINO™ toolkit supports neural-network models trained with various frameworks, including TensorFlow, PyTorch, ONNX, TensorFlow Lite, and PaddlePaddle (OpenVINO support for Apache MXNet, Caffe, and Kaldi is currently being deprecated and will be removed entirely in the future). The list of supported operations is different for each of the supported frameworks. To see the operations supported by your framework, refer to Supported Framework Operations.

Custom operations, which are not included in the list, are not recognized by OpenVINO out-of-the-box. The need for custom operation may appear in two cases:

  1. A new or rarely used regular framework operation is not supported in OpenVINO yet.

  2. A new user operation that was created for some specific model topology by the author of the model using framework extension capabilities.

Importing models with such operations requires additional steps. This guide illustrates the workflow for running inference on models featuring custom operations. This allows plugging in your own implementation for them. OpenVINO Extensibility API enables adding support for those custom operations and using one implementation for Model Optimizer and OpenVINO Runtime.

Defining a new custom operation basically consists of two parts:

  1. Definition of operation semantics in OpenVINO, the code that describes how this operation should be inferred consuming input tensor(s) and producing output tensor(s). The implementation of execution kernels for GPU is described in separate guides.

  2. Mapping rule that facilitates conversion of framework operation representation to OpenVINO defined operation semantics.

The first part is required for inference. The second part is required for successful import of a model containing such operations from the original framework model format. There are several options to implement each part. The following sections will describe them in detail.

Definition of Operation Semantics

If the custom operation can be mathematically represented as a combination of exiting OpenVINO operations and such decomposition gives desired performance, then low-level operation implementation is not required. Refer to the latest OpenVINO operation set, when deciding feasibility of such decomposition. You can use any valid combination of exiting operations. The next section of this document describes the way to map a custom operation.

If such decomposition is not possible or appears too bulky with a large number of constituent operations that do not perform well, then a new class for the custom operation should be implemented, as described in the Custom Operation Guide.

You might prefer implementing a custom operation class if you already have a generic C++ implementation of operation kernel. Otherwise, try to decompose the operation first, as described above. Then, after verifying correctness of inference and resulting performance, you may move on to optional implementation of Bare Metal C++.

Mapping from Framework Operation

Mapping of custom operation is implemented differently, depending on model format used for import. You may choose one of the following:

  1. If a model is represented in the ONNX (including models exported from Pytorch in ONNX), TensorFlow Lite, PaddlePaddle or TensorFlow formats, then one of the classes from Frontend Extension API should be used. It consists of several classes available in C++ which can be used with the --extensions option in Model Optimizer or when a model is imported directly to OpenVINO runtime using the read_model method. Python API is also available for runtime model import.

  2. If a model is represented in the Caffe, Kaldi or MXNet formats (as legacy frontends), then [Legacy] Model Optimizer Extensions should be used. This approach is available for model conversion in Model Optimizer only.

Existing of two approaches simultaneously is explained by two different types of frontends used for model conversion in OpenVINO: new frontends (ONNX, PaddlePaddle, TensorFlow Lite, and TensorFlow) and legacy frontends (Caffe, Kaldi, and Apache MXNet). Model Optimizer can use both frontends in contrast to the direct import of model with read_model method which can use new frontends only. Follow one of the appropriate guides referenced above to implement mappings depending on framework frontend.

If you are implementing extensions for new ONNX, PaddlePaddle, TensorFlow Lite or TensorFlow frontends and plan to use the --extensions option in Model Optimizer for model conversion, then the extensions should be:

  1. Implemented in C++ only.

  2. Compiled as a separate shared library (see details on how to do this further in this guide).

Model Optimizer does not support new frontend extensions written in Python API.

Remaining part of this guide describes application of Frontend Extension API for new frontends.

Registering Extensions

A custom operation class and a new mapping frontend extension class object should be registered to be usable in OpenVINO runtime.

Note

This documentation is derived from the Template extension, which demonstrates the details of extension development. It is based on minimalistic Identity operation that is a placeholder for your real custom operation. Review the complete, fully compilable code to see how it works.

Use the ov::Core::add_extension method to load the extensions to the ov::Core object. This method allows loading library with extensions or extensions from the code.

Load Extensions to Core

Extensions can be loaded from a code with the ov::Core::add_extension method:

# Not implemented
ov::Core core;

// Use operation type to add operation extension
core.add_extension<TemplateExtension::Identity>();

// or you can add operation extension object which is equivalent form
core.add_extension(ov::OpExtension<TemplateExtension::Identity>());

The Identity is a custom operation class defined in Custom Operation Guide. This is sufficient to enable reading OpenVINO IR which uses the Identity extension operation emitted by Model Optimizer. In order to load original model directly to the runtime, add a mapping extension:

# Not implemented
// Register mapping for new frontends: FW's "TemplateIdentity" operation to TemplateExtension::Identity
core.add_extension(ov::frontend::OpExtension<TemplateExtension::Identity>("Identity"));

// Register more sophisticated mapping with decomposition
core.add_extension(ov::frontend::ConversionExtension(
    "Identity",
    [](const ov::frontend::NodeContext& context) {
        // Arbitrary decomposition code here
        // Return a vector of operation outputs
        return ov::OutputVector{ std::make_shared<TemplateExtension::Identity>(context.get_input(0)) };
    }));

When Python API is used, there is no way to implement a custom OpenVINO operation. Even if custom OpenVINO operation is implemented in C++ and loaded into the runtime by a shared library, there is still no way to add a frontend mapping extension that refers to this custom operation. In this case, use C++ shared library approach to implement both operations semantics and framework mapping.

Python can still be used to map and decompose operations when only operations from the standard OpenVINO operation set are used.

Create a Library with Extensions

An extension library should be created in the following cases:

  • Conversion of a model with custom operations in Model Optimizer.

  • Loading a model with custom operations in a Python application. This applies to both framework model and OpenVINO IR.

  • Loading models with custom operations in tools that support loading extensions from a library, for example the benchmark_app.

To create an extension library, for example, to load the extensions into Model Optimizer, perform the following:

1. Create an entry point for extension library. OpenVINO provides the OPENVINO_CREATE_EXTENSIONS() macro, which allows to define an entry point to a library with OpenVINO Extensions. This macro should have a vector of all OpenVINO Extensions as an argument.

Based on that, the declaration of an extension class might look like the following:

OPENVINO_CREATE_EXTENSIONS(
    std::vector<ov::Extension::Ptr>({

        // Register operation itself, required to be read from IR
        std::make_shared<ov::OpExtension<TemplateExtension::Identity>>(),

        // Register operaton mapping, required when converted from framework model format
        std::make_shared<ov::frontend::OpExtension<TemplateExtension::Identity>>()
    }));
  1. Configure the build of your extension library, using the following CMake script:

set(CMAKE_CXX_STANDARD 11)

set(TARGET_NAME "openvino_template_extension")

find_package(OpenVINO REQUIRED)

set(SRC identity.cpp ov_extension.cpp)

add_library(${TARGET_NAME} MODULE ${SRC})

target_compile_definitions(${TARGET_NAME} PRIVATE IMPLEMENT_OPENVINO_EXTENSION_API)
target_link_libraries(${TARGET_NAME} PRIVATE openvino::runtime)

This CMake script finds OpenVINO, using the find_package CMake command.

  1. Build the extension library, running the commands below:

$ cd src/core/template_extension/new
$ mkdir build
$ cd build
$ cmake -DOpenVINO_DIR=<OpenVINO_DIR> ../
$ cmake --build .
  1. After the build, you may use the path to your extension library to load your extensions to OpenVINO Runtime:

core = ov.Core()
# Load extensions library to ov.Core
core.add_extension("libopenvino_template_extension.so")
ov::Core core;
// Load extensions library to ov::Core
core.add_extension("openvino_template_extension.so");