Executable Network

ExecutableNetwork class functionality:

  • Compile an InferenceEngine::ICNNNetwork instance to a backend specific graph representation
  • Create an arbitrary number of InferRequest objects
  • Hold some common resources shared between different instances of InferRequest. For example:
    • InferenceEngine::ExecutableNetworkInternal::_taskExecutor task executor to implement asynchronous execution
    • InferenceEngine::ExecutableNetworkInternal::_callbackExecutor task executor to run an asynchronous inference request callback in a separate thread

ExecutableNetwork Class

Inference Engine Plugin API provides the helper InferenceEngine::ExecutableNetworkThreadSafeDefault class recommended to use as a base class for an executable network. Based on that, a declaration of an executable network class can look as follows:

class ExecutableNetwork : public InferenceEngine::ExecutableNetworkThreadSafeDefault {
public:
ExecutableNetwork(const std::shared_ptr<const ngraph::Function>& function,
const Configuration& cfg,
const std::shared_ptr<Plugin>& plugin);
ExecutableNetwork(std::istream& model,
const Configuration& cfg,
const std::shared_ptr<Plugin>& plugin);
~ExecutableNetwork() override = default;
// Methods from a base class ExecutableNetworkThreadSafeDefault
void ExportImpl(std::ostream& model) override;
InferenceEngine::InferRequestInternal::Ptr CreateInferRequestImpl(InferenceEngine::InputsDataMap networkInputs,
InferenceEngine::OutputsDataMap networkOutputs) override;
InferenceEngine::Parameter GetMetric(const std::string &name) const override;
InferenceEngine::Parameter GetConfig(const std::string &name) const override;
private:
friend class TemplateInferRequest;
void CompileNetwork(const std::shared_ptr<const ngraph::Function>& function);
void InitExecutor();
std::atomic<std::size_t> _requestId = {0};
Configuration _cfg;
std::shared_ptr<Plugin> _plugin;
std::shared_ptr<ngraph::Function> _function;
std::map<std::string, std::size_t> _inputIndex;
std::map<std::string, std::size_t> _outputIndex;
};
IInferencePlugin::Ptr _plugin
A pointer to a IInferencePlugin interface.
Definition: ie_executable_network_internal.hpp:136
virtual void ExportImpl(std::ostream &networkModel)
Exports an internal hardware-dependent model to a stream.
Definition: ie_executable_network_internal.hpp:124
Parameter GetMetric(const std::string &name) const override
Gets general runtime metric for dedicated hardware.
Definition: ie_executable_network_internal.hpp:108
Parameter GetConfig(const std::string &name) const override
Gets configuration dedicated to plugin behaviour.
Definition: ie_executable_network_internal.hpp:103
This class provides optimal thread safe default implementation. The class is recommended to be used a...
Definition: ie_executable_network_thread_safe_default.hpp:26
IInferRequest::Ptr CreateInferRequest() override
Given optional implementation of creating asynchronous inference request to avoid need for it to be i...
Definition: ie_executable_network_thread_safe_default.hpp:53
virtual InferRequestInternal::Ptr CreateInferRequestImpl(InputsDataMap networkInputs, OutputsDataMap networkOutputs)=0
Creates a synchronous inference request object used to infer the network.
std::shared_ptr< IInferRequest > Ptr
std::shared_ptr< InferRequestInternal > Ptr
A shared pointer to a InferRequestInternal implementation.
Definition: ie_infer_request_internal.hpp:37

Class Fields

The example class has several fields:

  • _requestId - Tracks a number of created inference requests, which is used to distinguish different inference requests during profiling via the Intel® Instrumentation and Tracing Technology (ITT) library.
  • _cfg - Defines a configuration an executable network was compiled with.
  • _plugin - Refers to a plugin instance.
  • _function - Keeps a reference to transformed ngraph::Function which is used in ngraph reference backend computations. Note, in case of other backends with backend specific graph representation _function has different type and represents backend specific graph or just a set of computational kernels to perform an inference.
  • _inputIndex - maps a name of input with its index among all network inputs.
  • _outputIndex - maps a name of output with its index among all network outputs.

ExecutableNetwork Constructor with ICNNNetwork

This constructor accepts a generic representation of a neural network as an InferenceEngine::ICNNNetwork reference and is compiled into a backend specific device graph:

TemplatePlugin::ExecutableNetwork::ExecutableNetwork(const std::shared_ptr<const ngraph::Function>& function,
const Configuration& cfg,
const Plugin::Ptr& plugin) :
InferenceEngine::ExecutableNetworkThreadSafeDefault(nullptr, nullptr), // Disable default threads creation
_cfg(cfg),
_plugin(plugin) {
// TODO: if your plugin supports device ID (more that single instance of device can be on host machine)
// you should select proper device based on KEY_DEVICE_ID or automatic behavior
// In this case, _waitExecutor should also be created per device.
try {
CompileNetwork(function);
InitExecutor(); // creates thread-based executor using for async requests
} catch (const InferenceEngineException&) {
throw;
} catch (const std::exception & e) {
THROW_IE_EXCEPTION << "Standard exception from compilation library: " << e.what();
} catch (...) {
THROW_IE_EXCEPTION << "Generic exception is thrown";
}
}
#define THROW_IE_EXCEPTION

The implementation CompileNetwork is fully device-specific.

CompileNetwork()

The function accepts a const shared pointer to ngraph::Function object and performs the following steps:

  1. Applies ngraph passes using TransformNetwork function, which defines plugin-specific conversion pipeline.
  2. Maps the transformed graph to a backend specific graph representation (for example, to MKLDNN graph for Intel CPU).
  3. Allocates and fills memory for graph weights, backend specific memory handles and so on.
// forward declaration
std::shared_ptr<ngraph::Function> TransformNetwork(const std::shared_ptr<const ngraph::Function>& function);
void TemplatePlugin::ExecutableNetwork::CompileNetwork(const std::shared_ptr<const ngraph::Function>& function) {
// TODO: perform actual graph compilation / mapping to backend graph representation / kernels
_function = TransformNetwork(function);
// Generate backend specific blob mappings. For example Inference Engine uses not ngraph::Result nodes friendly name
// as inference request output names but the name of the layer before.
for (auto&& result : _function->get_results()) {
auto previousOutput = result->get_input_source_output(0);
auto outputName = previousOutput.get_node()->get_friendly_name();
if (previousOutput.get_node()->get_output_size() > 1) {
outputName += '.' + std::to_string(previousOutput.get_index());
}
_outputIndex.emplace(outputName, _function->get_result_index(result));
}
for (auto&& parameter : _function->get_parameters()) {
_inputIndex.emplace(parameter->get_friendly_name(), _function->get_parameter_index(parameter));
}
// Perform any other steps like allocation and filling backend specific memory handles and so on
}

NOTE: After all these steps, the backend specific graph is ready to create inference requests and perform inference.

ExecutableNetwork Constructor Importing from Stream

This constructor creates a backend specific graph by importing from a stream object:

NOTE: The export of backend specific graph is done in the ExportImpl method, and data formats must be the same for both import and export.

TemplatePlugin::ExecutableNetwork::ExecutableNetwork(std::istream & model,
const Configuration& cfg,
const Plugin::Ptr& plugin) :
_cfg(cfg),
_plugin(plugin) {
// TODO: since Import network is not a mandatory functionality, this ctor can just be removed
}
#define NOT_IMPLEMENTED_str
Defines the not implemented message.
Definition: exception2status.hpp:111

ExportImpl()

Implementation details:
Base InferenceEngine::ExecutableNetworkThreadSafeDefault class implements the public InferenceEngine::ExecutableNetworkThreadSafeDefault::Export method as following:

  • Writes _plugin->GetName() to the model stream.
  • Calls the ExportImpl method defined in a derived class to dump a backend specific graph.

The implementation of the method should write all data to the model stream, which is required to import a backend specific graph later in the Plugin::Import method:

void TemplatePlugin::ExecutableNetwork::ExportImpl(std::ostream& modelStream) {
// TODO: Code which exports graph from std::ostream
}

CreateInferRequest()

The method creates an asynchronous inference request and returns it. While the public Inference Engine API has a single interface for inference request, which can be executed in synchronous and asynchronous modes, a plugin library implementation has two separate classes:

  • Synchronous inference request, which defines pipeline stages and runs them synchronously in the Infer method.
  • Asynchronous inference request, which is a wrapper for a synchronous inference request and can run a pipeline asynchronously. Depending on a device pipeline structure, it can has one or several stages:
    • For single-stage pipelines, there is no need to define this method and create a class derived from InferenceEngine::AsyncInferRequestThreadSafeDefault. For single stage pipelines, a default implementation of this method creates InferenceEngine::AsyncInferRequestThreadSafeDefault wrapping a synchronous inference request and runs it asynchronously in the _taskExecutor executor.
    • For pipelines with multiple stages, such as performing some preprocessing on host, uploading input data to a device, running inference on a device, or downloading and postprocessing output data, schedule stages on several task executors to achieve better device use and performance. You can do it by creating a sufficient number of inference requests running in parallel. In this case, device stages of different inference requests are overlapped with preprocessing and postprocessing stage giving better performance.

      IMPORTANT: It is up to you to decide how many task executors you need to optimally execute a device pipeline.

      IInferRequest::Ptr TemplatePlugin::ExecutableNetwork::CreateInferRequest() {
      IInferRequest::Ptr asyncRequest;
      auto internalRequest = CreateInferRequestImpl(_networkInputs, _networkOutputs);
      auto asyncThreadSafeImpl = std::make_shared<TemplateAsyncInferRequest>(std::static_pointer_cast<TemplateInferRequest>(internalRequest),
      _taskExecutor, _plugin->_waitExecutor, _callbackExecutor);
      asyncRequest.reset(new InferenceEngine::InferRequestBase<TemplateAsyncInferRequest>(asyncThreadSafeImpl),
      [](InferenceEngine::IInferRequest *p) { p->Release(); });
      asyncThreadSafeImpl->SetPointerToPublicInterface(asyncRequest);
      return asyncRequest;
      }
      Inference request noexcept wrapper which accepts IAsyncInferRequestInternal derived instance which ca...
      Definition: ie_infer_async_request_base.hpp:26

      CreateInferRequestImpl()

This is a helper method used by CreateInferRequest to create a synchronous inference request, which is later wrapped with the asynchronous inference request class:

InferenceEngine::InferRequestInternal::Ptr TemplatePlugin::ExecutableNetwork::CreateInferRequestImpl(InferenceEngine::InputsDataMap networkInputs,
InferenceEngine::OutputsDataMap networkOutputs) {
return std::make_shared<TemplateInferRequest>(networkInputs, networkOutputs, std::static_pointer_cast<ExecutableNetwork>(shared_from_this()));
}

GetMetric()

Returns a metric value for a metric with the name name. A metric is a static type of information about an executable network. Examples of metrics:

  • EXEC_NETWORK_METRIC_KEY(NETWORK_NAME) - name of an executable network
  • EXEC_NETWORK_METRIC_KEY(OPTIMAL_NUMBER_OF_INFER_REQUESTS) - heuristic to denote an optimal (or at least sub-optimal) number of inference requests needed to run asynchronously to use the current device fully
  • Any other executable network metric specific for a particular device. Such metrics and possible values must be declared in a plugin configuration public header, for example, template/template_config.hpp
InferenceEngine::Parameter TemplatePlugin::ExecutableNetwork::GetMetric(const std::string &name) const {
// TODO: return more supported values for metrics
if (METRIC_KEY(SUPPORTED_METRICS) == name) {
IE_SET_METRIC_RETURN(SUPPORTED_METRICS, std::vector<std::string>{
METRIC_KEY(NETWORK_NAME),
METRIC_KEY(SUPPORTED_METRICS),
METRIC_KEY(SUPPORTED_CONFIG_KEYS),
METRIC_KEY(OPTIMAL_NUMBER_OF_INFER_REQUESTS)});
} else if (METRIC_KEY(SUPPORTED_CONFIG_KEYS) == name) {
std::vector<std::string> configKeys = {
CONFIG_KEY(DEVICE_ID),
CONFIG_KEY(PERF_COUNT),
TEMPLATE_CONFIG_KEY(THROUGHPUT_STREAMS) };
auto streamExecutorConfigKeys = IStreamsExecutor::Config{}.SupportedKeys();
for (auto&& configKey : streamExecutorConfigKeys) {
configKeys.emplace_back(configKey);
}
IE_SET_METRIC_RETURN(SUPPORTED_CONFIG_KEYS, configKeys);
} else if (METRIC_KEY(NETWORK_NAME) == name) {
auto networkName = _function->get_friendly_name();
IE_SET_METRIC_RETURN(NETWORK_NAME, networkName);
} else if (METRIC_KEY(OPTIMAL_NUMBER_OF_INFER_REQUESTS) == name) {
unsigned int value = _cfg._streamsExecutorConfig._streams;
IE_SET_METRIC_RETURN(OPTIMAL_NUMBER_OF_INFER_REQUESTS, value);
} else {
THROW_IE_EXCEPTION << "Unsupported ExecutableNetwork metric: " << name;
}
}
#define IE_SET_METRIC_RETURN(name,...)
Return metric value with specified name and arguments .... Example:
Definition: ie_metric_helpers.hpp:52
#define METRIC_KEY(name)
#define CONFIG_KEY(name)

The IE_SET_METRIC_RETURN helper macro sets metric value and checks that the actual metric type matches a type of the specified value.

GetConfig()

Returns a current value for a configuration key with the name name. The method extracts configuration values an executable network is compiled with.

Parameter TemplatePlugin::ExecutableNetwork::GetConfig(const std::string &name) const {
return _cfg.Get(name);
}

This function is the only way to get configuration values when a network is imported and compiled by other developers and tools (for example, the Compile tool).

The next step in plugin library implementation is the Synchronous Inference Request class.