ExecutableNetwork
class functionality:
- Compile an InferenceEngine::ICNNNetwork instance to a backend specific graph representation
- Create an arbitrary number of
InferRequest
objects
- Hold some common resources shared between different instances of
InferRequest
. For example:
- InferenceEngine::ExecutableNetworkInternal::_taskExecutor task executor to implement asynchronous execution
- InferenceEngine::ExecutableNetworkInternal::_callbackExecutor task executor to run an asynchronous inference request callback in a separate thread
ExecutableNetwork
Class
Inference Engine Plugin API provides the helper InferenceEngine::ExecutableNetworkThreadSafeDefault class recommended to use as a base class for an executable network. Based on that, a declaration of an executable network class can look as follows:
public:
ExecutableNetwork(const std::shared_ptr<const ngraph::Function>& function,
const Configuration& cfg,
const std::shared_ptr<Plugin>& plugin);
ExecutableNetwork(std::istream& model,
const Configuration& cfg,
const std::shared_ptr<Plugin>& plugin);
~ExecutableNetwork() override = default;
InferenceEngine::OutputsDataMap networkOutputs) override;
private:
friend class TemplateInferRequest;
void CompileNetwork(const std::shared_ptr<const ngraph::Function>& function);
void InitExecutor();
std::atomic<std::size_t> _requestId = {0};
Configuration _cfg;
std::shared_ptr<ngraph::Function> _function;
std::map<std::string, std::size_t> _inputIndex;
std::map<std::string, std::size_t> _outputIndex;
};
IInferencePlugin::Ptr _plugin
A pointer to a IInferencePlugin interface.
Definition: ie_executable_network_internal.hpp:136
virtual void ExportImpl(std::ostream &networkModel)
Exports an internal hardware-dependent model to a stream.
Definition: ie_executable_network_internal.hpp:124
Parameter GetMetric(const std::string &name) const override
Gets general runtime metric for dedicated hardware.
Definition: ie_executable_network_internal.hpp:108
Parameter GetConfig(const std::string &name) const override
Gets configuration dedicated to plugin behaviour.
Definition: ie_executable_network_internal.hpp:103
This class provides optimal thread safe default implementation. The class is recommended to be used a...
Definition: ie_executable_network_thread_safe_default.hpp:26
IInferRequest::Ptr CreateInferRequest() override
Given optional implementation of creating asynchronous inference request to avoid need for it to be i...
Definition: ie_executable_network_thread_safe_default.hpp:53
virtual InferRequestInternal::Ptr CreateInferRequestImpl(InputsDataMap networkInputs, OutputsDataMap networkOutputs)=0
Creates a synchronous inference request object used to infer the network.
std::shared_ptr< IInferRequest > Ptr
std::shared_ptr< InferRequestInternal > Ptr
A shared pointer to a InferRequestInternal implementation.
Definition: ie_infer_request_internal.hpp:37
Class Fields
The example class has several fields:
_requestId
- Tracks a number of created inference requests, which is used to distinguish different inference requests during profiling via the Intel® Instrumentation and Tracing Technology (ITT) library.
_cfg
- Defines a configuration an executable network was compiled with.
_plugin
- Refers to a plugin instance.
_function
- Keeps a reference to transformed ngraph::Function
which is used in ngraph reference backend computations. Note, in case of other backends with backend specific graph representation _function
has different type and represents backend specific graph or just a set of computational kernels to perform an inference.
_inputIndex
- maps a name of input with its index among all network inputs.
_outputIndex
- maps a name of output with its index among all network outputs.
ExecutableNetwork
Constructor with ICNNNetwork
This constructor accepts a generic representation of a neural network as an InferenceEngine::ICNNNetwork reference and is compiled into a backend specific device graph:
TemplatePlugin::ExecutableNetwork::ExecutableNetwork(const std::shared_ptr<const ngraph::Function>& function,
const Configuration& cfg,
const Plugin::Ptr& plugin) :
InferenceEngine::ExecutableNetworkThreadSafeDefault(nullptr, nullptr),
_cfg(cfg),
_plugin(plugin) {
try {
CompileNetwork(function);
InitExecutor();
} catch (const InferenceEngineException&) {
throw;
} catch (const std::exception & e) {
} catch (...) {
}
}
#define THROW_IE_EXCEPTION
The implementation CompileNetwork
is fully device-specific.
CompileNetwork()
The function accepts a const shared pointer to ngraph::Function
object and performs the following steps:
- Applies ngraph passes using
TransformNetwork
function, which defines plugin-specific conversion pipeline.
- Maps the transformed graph to a backend specific graph representation (for example, to MKLDNN graph for Intel CPU).
- Allocates and fills memory for graph weights, backend specific memory handles and so on.
std::shared_ptr<ngraph::Function> TransformNetwork(const std::shared_ptr<const ngraph::Function>& function);
void TemplatePlugin::ExecutableNetwork::CompileNetwork(const std::shared_ptr<const ngraph::Function>& function) {
_function = TransformNetwork(function);
for (auto&& result : _function->get_results()) {
auto previousOutput = result->get_input_source_output(0);
auto outputName = previousOutput.get_node()->get_friendly_name();
if (previousOutput.get_node()->get_output_size() > 1) {
outputName += '.' + std::to_string(previousOutput.get_index());
}
_outputIndex.emplace(outputName, _function->get_result_index(result));
}
for (auto&& parameter : _function->get_parameters()) {
_inputIndex.emplace(parameter->get_friendly_name(), _function->get_parameter_index(parameter));
}
}
NOTE: After all these steps, the backend specific graph is ready to create inference requests and perform inference.
ExecutableNetwork
Constructor Importing from Stream
This constructor creates a backend specific graph by importing from a stream object:
NOTE: The export of backend specific graph is done in the ExportImpl
method, and data formats must be the same for both import and export.
TemplatePlugin::ExecutableNetwork::ExecutableNetwork(std::istream & model,
const Configuration& cfg,
const Plugin::Ptr& plugin) :
_cfg(cfg),
_plugin(plugin) {
}
#define NOT_IMPLEMENTED_str
Defines the not implemented message.
Definition: exception2status.hpp:111
ExportImpl()
Implementation details:
Base InferenceEngine::ExecutableNetworkThreadSafeDefault class implements the public InferenceEngine::ExecutableNetworkThreadSafeDefault::Export method as following:
- Writes
_plugin->GetName()
to the model
stream.
- Calls the
ExportImpl
method defined in a derived class to dump a backend specific graph.
The implementation of the method should write all data to the model
stream, which is required to import a backend specific graph later in the Plugin::Import
method:
void TemplatePlugin::ExecutableNetwork::ExportImpl(std::ostream& modelStream) {
}
CreateInferRequest()
The method creates an asynchronous inference request and returns it. While the public Inference Engine API has a single interface for inference request, which can be executed in synchronous and asynchronous modes, a plugin library implementation has two separate classes:
- Synchronous inference request, which defines pipeline stages and runs them synchronously in the
Infer
method.
- Asynchronous inference request, which is a wrapper for a synchronous inference request and can run a pipeline asynchronously. Depending on a device pipeline structure, it can has one or several stages:
- For single-stage pipelines, there is no need to define this method and create a class derived from InferenceEngine::AsyncInferRequestThreadSafeDefault. For single stage pipelines, a default implementation of this method creates InferenceEngine::AsyncInferRequestThreadSafeDefault wrapping a synchronous inference request and runs it asynchronously in the
_taskExecutor
executor.
- For pipelines with multiple stages, such as performing some preprocessing on host, uploading input data to a device, running inference on a device, or downloading and postprocessing output data, schedule stages on several task executors to achieve better device use and performance. You can do it by creating a sufficient number of inference requests running in parallel. In this case, device stages of different inference requests are overlapped with preprocessing and postprocessing stage giving better performance.
IMPORTANT: It is up to you to decide how many task executors you need to optimally execute a device pipeline.
IInferRequest::Ptr TemplatePlugin::ExecutableNetwork::CreateInferRequest() {
IInferRequest::Ptr asyncRequest;
auto internalRequest = CreateInferRequestImpl(_networkInputs, _networkOutputs);
auto asyncThreadSafeImpl = std::make_shared<TemplateAsyncInferRequest>(std::static_pointer_cast<TemplateInferRequest>(internalRequest),
_taskExecutor, _plugin->_waitExecutor, _callbackExecutor);
asyncThreadSafeImpl->SetPointerToPublicInterface(asyncRequest);
return asyncRequest;
}
Inference request noexcept wrapper which accepts IAsyncInferRequestInternal derived instance which ca...
Definition: ie_infer_async_request_base.hpp:26
CreateInferRequestImpl()
This is a helper method used by CreateInferRequest
to create a synchronous inference request, which is later wrapped with the asynchronous inference request class:
InferenceEngine::OutputsDataMap networkOutputs) {
return std::make_shared<TemplateInferRequest>(networkInputs, networkOutputs, std::static_pointer_cast<ExecutableNetwork>(shared_from_this()));
}
GetMetric()
Returns a metric value for a metric with the name name
. A metric is a static type of information about an executable network. Examples of metrics:
- EXEC_NETWORK_METRIC_KEY(NETWORK_NAME) - name of an executable network
- EXEC_NETWORK_METRIC_KEY(OPTIMAL_NUMBER_OF_INFER_REQUESTS) - heuristic to denote an optimal (or at least sub-optimal) number of inference requests needed to run asynchronously to use the current device fully
- Any other executable network metric specific for a particular device. Such metrics and possible values must be declared in a plugin configuration public header, for example,
template/template_config.hpp
}
else if (
METRIC_KEY(SUPPORTED_CONFIG_KEYS) == name) {
std::vector<std::string> configKeys = {
TEMPLATE_CONFIG_KEY(THROUGHPUT_STREAMS) };
auto streamExecutorConfigKeys = IStreamsExecutor::Config{}.SupportedKeys();
for (auto&& configKey : streamExecutorConfigKeys) {
configKeys.emplace_back(configKey);
}
auto networkName = _function->get_friendly_name();
}
else if (
METRIC_KEY(OPTIMAL_NUMBER_OF_INFER_REQUESTS) == name) {
unsigned int value = _cfg._streamsExecutorConfig._streams;
} else {
}
}
#define IE_SET_METRIC_RETURN(name,...)
Return metric value with specified name and arguments .... Example:
Definition: ie_metric_helpers.hpp:52
The IE_SET_METRIC_RETURN helper macro sets metric value and checks that the actual metric type matches a type of the specified value.
GetConfig()
Returns a current value for a configuration key with the name name
. The method extracts configuration values an executable network is compiled with.
Parameter TemplatePlugin::ExecutableNetwork::GetConfig(const std::string &name) const {
return _cfg.Get(name);
}
This function is the only way to get configuration values when a network is imported and compiled by other developers and tools (for example, the Compile tool).
The next step in plugin library implementation is the Synchronous Inference Request class.