Plugin¶
Inference Engine Plugin usually represents a wrapper around a backend. Backends can be:
OpenCL-like backend (e.g. clDNN library) for GPU devices.
oneDNN backend for Intel CPU devices.
NVIDIA cuDNN for NVIDIA GPUs.
The responsibility of Inference Engine Plugin:
Initializes a backend and throw exception in
Engine
constructor if backend cannot be initialized.Provides information about devices enabled by a particular backend, e.g. how many devices, their properties and so on.
Loads or imports executable network objects.
In addition to the Inference Engine Public API, the Inference Engine provides the Plugin API, which is a set of functions and helper classes that simplify new plugin development:
header files in the
inference_engine/src/plugin_api
directoryimplementations in the
inference_engine/src/inference_engine
directorysymbols in the Inference Engine Core shared library
To build an Inference Engine plugin with the Plugin API, see the Inference Engine Plugin Building guide.
Plugin Class¶
Inference Engine Plugin API provides the helper InferenceEngine::IInferencePlugin class recommended to use as a base class for a plugin. Based on that, declaration of a plugin class can look as follows:
namespace TemplatePlugin {
class Plugin : public InferenceEngine::IInferencePlugin {
public:
using Ptr = std::shared_ptr<Plugin>;
Plugin();
~Plugin();
void SetConfig(const std::map<std::string, std::string>& config) override;
InferenceEngine::QueryNetworkResult QueryNetwork(const InferenceEngine::CNNNetwork& network,
const std::map<std::string, std::string>& config) const override;
InferenceEngine::IExecutableNetworkInternal::Ptr LoadExeNetworkImpl(
const InferenceEngine::CNNNetwork& network,
const std::map<std::string, std::string>& config) override;
void AddExtension(const std::shared_ptr<InferenceEngine::IExtension>& extension) override;
InferenceEngine::Parameter GetConfig(
const std::string& name,
const std::map<std::string, InferenceEngine::Parameter>& options) const override;
InferenceEngine::Parameter GetMetric(
const std::string& name,
const std::map<std::string, InferenceEngine::Parameter>& options) const override;
InferenceEngine::IExecutableNetworkInternal::Ptr ImportNetwork(
std::istream& model,
const std::map<std::string, std::string>& config) override;
private:
friend class ExecutableNetwork;
friend class TemplateInferRequest;
std::shared_ptr<ngraph::runtime::Backend> _backend;
Configuration _cfg;
InferenceEngine::ITaskExecutor::Ptr _waitExecutor;
};
} // namespace TemplatePlugin
Class Fields¶
The provided plugin class also has several fields:
_backend
- a backend engine that is used to perform actual computations for network inference. ForTemplate
pluginngraph::runtime::Backend
is used which performs computations using OpenVINO™ reference implementations._waitExecutor
- a task executor that waits for a response from a device about device tasks completion._cfg
of typeConfiguration
:
using ConfigMap = std::map<std::string, std::string>;
struct Configuration {
Configuration();
Configuration(const Configuration&) = default;
Configuration(Configuration&&) = default;
Configuration& operator=(const Configuration&) = default;
Configuration& operator=(Configuration&&) = default;
explicit Configuration(const ConfigMap& config,
const Configuration& defaultCfg = {},
const bool throwOnUnsupported = true);
InferenceEngine::Parameter Get(const std::string& name) const;
// Plugin configuration parameters
int deviceId = 0;
bool perfCount = true;
InferenceEngine::IStreamsExecutor::Config _streamsExecutorConfig;
ov::hint::PerformanceMode performance_mode = ov::hint::PerformanceMode::UNDEFINED;
};
As an example, a plugin configuration has three value parameters:
deviceId
- particular device ID to work with. Applicable if a plugin supports more than oneTemplate
device. In this case, some plugin methods, likeSetConfig
,QueryNetwork
, andLoadNetwork
, must support the CONFIG_KEY(KEY_DEVICE_ID) parameter.perfCounts
- boolean value to identify whether to collect performance counters during Inference Request execution._streamsExecutorConfig
- configuration ofInferenceEngine::IStreamsExecutor
to handle settings of multi-threaded context.
Engine Constructor¶
A plugin constructor must contain code that checks the ability to work with a device of the Template
type. For example, if some drivers are required, the code must check driver availability. If a driver is not available (for example, OpenCL runtime is not installed in case of a GPU device or there is an improper version of a driver is on a host machine), an exception must be thrown from a plugin constructor.
A plugin must define a device name enabled via the _pluginName
field of a base class:
Plugin::Plugin() {
// TODO: fill with actual device name, backend engine
_pluginName = "TEMPLATE";
// create ngraph backend which performs inference using ngraph reference implementations
_backend = ngraph::runtime::Backend::create();
// create default stream executor with a given name
_waitExecutor = executorManager()->getIdleCPUStreamsExecutor({"TemplateWaitExecutor"});
}
Implementation details: The base InferenceEngine::IInferencePlugin class provides a common implementation of the public InferenceEngine::IInferencePlugin::LoadNetwork method that calls plugin-specific LoadExeNetworkImpl
, which is defined in a derived class.
This is the most important function of the Plugin
class and creates an instance of compiled ExecutableNetwork
, which holds a backend-dependent compiled graph in an internal representation:
InferenceEngine::IExecutableNetworkInternal::Ptr Plugin::LoadExeNetworkImpl(const InferenceEngine::CNNNetwork& network,
const ConfigMap& config) {
OV_ITT_SCOPED_TASK(itt::domains::TemplatePlugin, "Plugin::LoadExeNetworkImpl");
InferenceEngine::InputsDataMap networkInputs = network.getInputsInfo();
InferenceEngine::OutputsDataMap networkOutputs = network.getOutputsInfo();
auto fullConfig = Configuration{config, _cfg};
return std::make_shared<ExecutableNetwork>(network.getFunction(),
networkInputs,
networkOutputs,
fullConfig,
std::static_pointer_cast<Plugin>(shared_from_this()));
}
Before a creation of an ExecutableNetwork
instance via a constructor, a plugin may check if a provided InferenceEngine::ICNNNetwork object is supported by a device. In the example above, the plugin checks precision information.
The very important part before creation of ExecutableNetwork
instance is to call TransformNetwork
method which applies OpenVINO™ transformation passes.
Actual graph compilation is done in the ExecutableNetwork
constructor. Refer to the ExecutableNetwork Implementation Guide for details.
Note
Actual configuration map used in ExecutableNetwork
is constructed as a base plugin configuration set via Plugin::SetConfig
, where some values are overwritten with config
passed to Plugin::LoadExeNetworkImpl
. Therefore, the config of Plugin::LoadExeNetworkImpl
has a higher priority.
The function accepts a const shared pointer to ov::Model
object and performs the following steps:
Deep copies a const object to a local object, which can later be modified.
Applies common and plugin-specific transformations on a copied graph to make the graph more friendly to hardware operations. For details how to write custom plugin-specific transformation, please, refer to Writing OpenVINO™ transformations guide. See detailed topics about network representation:
void TransformNetwork(std::shared_ptr<ngraph::Function>& function,
const InferenceEngine::InputsDataMap& inputInfoMap,
const InferenceEngine::OutputsDataMap& outputsInfoMap) {
// Perform common optimizations and device-specific transformations
ngraph::pass::Manager passManager;
// Example: register transformation to convert preprocessing information to graph nodes
passManager.register_pass<ngraph::pass::AddPreprocessing>(inputInfoMap);
// TODO: add post-processing based on outputsInfoMap
// Example: register CommonOptimizations transformation from transformations library
passManager.register_pass<ngraph::pass::CommonOptimizations>();
// G-API supports only FP32 networks for pre-processing
bool needF16toF32 = false;
for (const auto& param : function->get_parameters()) {
if (param->get_element_type() == ngraph::element::f16 &&
inputInfoMap.at(param->get_friendly_name())->getTensorDesc().getPrecision() !=
InferenceEngine::Precision::FP16) {
needF16toF32 = true;
break;
}
}
if (needF16toF32) {
passManager.register_pass<ngraph::pass::ConvertPrecision>(
precisions_array{{ngraph::element::f16, ngraph::element::f32}});
}
// Example: register plugin specific transformation
passManager.register_pass<ov::pass::DecomposeDivideMatcher>();
passManager.register_pass<ov::pass::ReluReluFusionMatcher>();
// Register any other transformations
// ..
const auto& pass_config = passManager.get_pass_config();
// Allow FP16 Converts to be folded and FP16 constants to be upgraded to FP32 data type
pass_config->disable<ov::pass::DisableDecompressionConvertConstantFolding>();
pass_config->disable<ov::pass::ConvertCompressedOnlyToLegacy>();
// After `run_passes`, we have the transformed function, where operations match device operations,
// and we can create device backend-dependent graph
passManager.run_passes(function);
}
Note
After all these transformations, a ov::Model
object contains operations which can be perfectly mapped to backend kernels. E.g. if backend has kernel computing A + B
operations at once, the TransformNetwork
function should contain a pass which fuses operations A
and B
into a single custom operation A + B
which fits backend kernels set.
Use the method with the HETERO
mode, which allows to distribute network execution between different devices based on the ov::Node::get_rt_info()
map, which can contain the "affinity"
key. The QueryNetwork
method analyzes operations of provided network
and returns a list of supported operations via the InferenceEngine::QueryNetworkResult structure. The QueryNetwork
firstly applies TransformNetwork
passes to input ov::Model
argument. After this, the transformed network in ideal case contains only operations are 1:1 mapped to kernels in computational backend. In this case, it’s very easy to analyze which operations is supposed (_backend
has a kernel for such operation or extensions for the operation is provided) and not supported (kernel is missed in _backend
):
Store original names of all operations in input
ov::Model
Apply
TransformNetwork
passes. Note, the names of operations in a transformed network can be different and we need to restore the mapping in the steps below.Construct
supported
andunsupported
maps which contains names of original operations. Note, that since the inference is performed using OpenVINO™ reference backend, the decision whether the operation is supported or not depends on whether the latest OpenVINO opset contains such operation.QueryNetworkResult.supportedLayersMap
contains only operations which are fully supported by_backend
.
InferenceEngine::QueryNetworkResult Plugin::QueryNetwork(const InferenceEngine::CNNNetwork& network,
const ConfigMap& config) const {
OV_ITT_SCOPED_TASK(itt::domains::TemplatePlugin, "Plugin::QueryNetwork");
Configuration fullConfig{config, _cfg, false};
auto model = network.getFunction();
if (model == nullptr) {
IE_THROW() << "Only ngraph-based models are supported!";
}
auto supported = InferenceEngine::GetSupportedNodes(
model,
[&](std::shared_ptr<ov::Model>& model) {
// 1. It is needed to apply all transformations as it is done in LoadExeNetworkImpl
TransformNetwork(model, network.getInputsInfo(), network.getOutputsInfo());
},
[&](std::shared_ptr<ngraph::Node> node) {
// 2. Сheck whether node is supported
ngraph::OpSet op_super_set;
#define _OPENVINO_OP_REG(NAME, NAMESPACE) op_super_set.insert<NAMESPACE::NAME>();
#include "openvino/opsets/opset1_tbl.hpp"
#include "openvino/opsets/opset2_tbl.hpp"
#include "openvino/opsets/opset3_tbl.hpp"
#include "openvino/opsets/opset4_tbl.hpp"
#include "openvino/opsets/opset5_tbl.hpp"
#include "openvino/opsets/opset6_tbl.hpp"
#include "openvino/opsets/opset7_tbl.hpp"
#include "openvino/opsets/opset8_tbl.hpp"
#undef _OPENVINO_OP_REG
return op_super_set.contains_type(node->get_type_info());
});
// 3. Produce the result
InferenceEngine::QueryNetworkResult res;
for (auto&& layerName : supported) {
res.supportedLayersMap.emplace(layerName, GetName());
}
return res;
}
Adds an extension of the InferenceEngine::IExtensionPtr type to a plugin. If a plugin does not support extensions, the method must throw an exception:
void Plugin::AddExtension(const InferenceEngine::IExtensionPtr& /\*extension\*/) {
// TODO: add extensions if plugin supports extensions
IE_THROW(NotImplemented);
}
Sets new values for plugin configuration keys:
void Plugin::SetConfig(const ConfigMap& config) {
_cfg = Configuration{config, _cfg};
}
In the snippet above, the Configuration
class overrides previous configuration values with the new ones. All these values are used during backend specific graph compilation and execution of inference requests.
Note
The function must throw an exception if it receives an unsupported configuration key.
Returns a current value for a specified configuration key:
InferenceEngine::Parameter Plugin::GetConfig(
const std::string& name,
const std::map<std::string, InferenceEngine::Parameter>& /\*options\*/) const {
return _cfg.Get(name);
}
The function is implemented with the Configuration::Get
method, which wraps an actual configuration key value to the InferenceEngine::Parameter and returns it.
Note
The function must throw an exception if it receives an unsupported configuration key.
Returns a metric value for a metric with the name name
. A device metric is a static type of information from a plugin about its devices or device capabilities.
Examples of metrics:
METRIC_KEY(AVAILABLE_DEVICES) - list of available devices that are required to implement. In this case, you can use all devices of the same
Template
type with automatic logic of theMULTI
device plugin.METRIC_KEY(FULL_DEVICE_NAME) - full device name. In this case, a particular device ID is specified in the
option
parameter as{ CONFIG_KEY(KEY_DEVICE_ID), "deviceID" }
.METRIC_KEY(SUPPORTED_METRICS) - list of metrics supported by a plugin
METRIC_KEY(SUPPORTED_CONFIG_KEYS) - list of configuration keys supported by a plugin that affects their behavior during a backend specific graph compilation or an inference requests execution
METRIC_KEY(OPTIMIZATION_CAPABILITIES) - list of optimization capabilities of a device. For example, supported data types and special optimizations for them.
Any other device-specific metrics. In this case, place metrics declaration and possible values to a plugin-specific public header file, for example,
template/template_config.hpp
. The example below demonstrates the definition of a new optimization capability value specific for a device:
/\*\*
\* @brief Defines whether current Template device instance supports hardware blocks for fast convolution computations.
\*/
DECLARE_TEMPLATE_METRIC_VALUE(HARDWARE_CONVOLUTION);
The snippet below provides an example of the implementation for GetMetric
:
InferenceEngine::Parameter Plugin::GetMetric(const std::string& name,
const std::map<std::string, InferenceEngine::Parameter>& options) const {
if (METRIC_KEY(SUPPORTED_METRICS) == name) {
std::vector<std::string> supportedMetrics = {METRIC_KEY(AVAILABLE_DEVICES),
METRIC_KEY(SUPPORTED_METRICS),
METRIC_KEY(SUPPORTED_CONFIG_KEYS),
METRIC_KEY(FULL_DEVICE_NAME),
METRIC_KEY(IMPORT_EXPORT_SUPPORT),
METRIC_KEY(DEVICE_ARCHITECTURE),
METRIC_KEY(OPTIMIZATION_CAPABILITIES),
METRIC_KEY(RANGE_FOR_ASYNC_INFER_REQUESTS)};
IE_SET_METRIC_RETURN(SUPPORTED_METRICS, supportedMetrics);
} else if (METRIC_KEY(SUPPORTED_CONFIG_KEYS) == name) {
std::vector<std::string> configKeys = {CONFIG_KEY(DEVICE_ID),
CONFIG_KEY(PERF_COUNT),
ov::hint::performance_mode.name(),
TEMPLATE_CONFIG_KEY(THROUGHPUT_STREAMS)};
auto streamExecutorConfigKeys = InferenceEngine::IStreamsExecutor::Config{}.SupportedKeys();
for (auto&& configKey : streamExecutorConfigKeys) {
if (configKey != InferenceEngine::PluginConfigParams::KEY_CPU_THROUGHPUT_STREAMS) {
configKeys.emplace_back(configKey);
}
}
IE_SET_METRIC_RETURN(SUPPORTED_CONFIG_KEYS, configKeys);
} else if (METRIC_KEY(AVAILABLE_DEVICES) == name) {
// TODO: fill list of available devices
std::vector<std::string> availableDevices = {""};
IE_SET_METRIC_RETURN(AVAILABLE_DEVICES, availableDevices);
} else if (METRIC_KEY(FULL_DEVICE_NAME) == name) {
std::string name = "Template Device Full Name";
IE_SET_METRIC_RETURN(FULL_DEVICE_NAME, name);
} else if (METRIC_KEY(IMPORT_EXPORT_SUPPORT) == name) {
IE_SET_METRIC_RETURN(IMPORT_EXPORT_SUPPORT, true);
} else if (METRIC_KEY(DEVICE_ARCHITECTURE) == name) {
// TODO: return device architecture for device specified by DEVICE_ID config
std::string arch = "TEMPLATE";
IE_SET_METRIC_RETURN(DEVICE_ARCHITECTURE, arch);
} else if (METRIC_KEY(OPTIMIZATION_CAPABILITIES) == name) {
// TODO: fill actual list of supported capabilities: e.g. Template device supports only FP32
std::vector<std::string> capabilities = {METRIC_VALUE(FP32) /\*, TEMPLATE_METRIC_VALUE(HARDWARE_CONVOLUTION)\*/};
IE_SET_METRIC_RETURN(OPTIMIZATION_CAPABILITIES, capabilities);
} else if (METRIC_KEY(RANGE_FOR_ASYNC_INFER_REQUESTS) == name) {
// TODO: fill with actual values
using uint = unsigned int;
IE_SET_METRIC_RETURN(RANGE_FOR_ASYNC_INFER_REQUESTS, std::make_tuple(uint{1}, uint{1}, uint{1}));
} else {
IE_THROW(NotFound) << "Unsupported device metric: " << name;
}
}
Note
If an unsupported metric key is passed to the function, it must throw an exception.
The importing network mechanism allows to import a previously exported backend specific graph and wrap it using an ExecutableNetwork object. This functionality is useful if backend specific graph compilation takes significant time and/or cannot be done on a target host device due to other reasons.
During export of backend specific graph using ExecutableNetwork::Export
, a plugin may export any type of information it needs to import a compiled graph properly and check its correctness. For example, the export information may include:
Compilation options (state of
Plugin::_cfg
structure)Information about a plugin and a device type to check this information later during the import and throw an exception if the
model
stream contains wrong data. For example, if devices have different capabilities and a graph compiled for a particular device cannot be used for another, such type of information must be stored and checked during the import.Compiled backend specific graph itself
Information about precisions and shapes set by the user
InferenceEngine::IExecutableNetworkInternal::Ptr Plugin::ImportNetwork(
std::istream& modelStream,
const std::map<std::string, std::string>& config) {
OV_ITT_SCOPED_TASK(itt::domains::TemplatePlugin, "Plugin::ImportNetwork");
auto fullConfig = Configuration{config, _cfg};
auto exec = std::make_shared<ExecutableNetwork>(modelStream,
fullConfig,
std::static_pointer_cast<Plugin>(shared_from_this()));
SetExeNetworkInfo(exec, exec->_function);
return exec;
}
Create Instance of Plugin Class¶
Inference Engine plugin library must export only one function creating a plugin instance using IE_DEFINE_PLUGIN_CREATE_FUNCTION macro:
static const InferenceEngine::Version version = {{2, 1}, CI_BUILD_NUMBER, "openvino_template_plugin"};
IE_DEFINE_PLUGIN_CREATE_FUNCTION(Plugin, version)
Next step in a plugin library implementation is the ExecutableNetwork class.