ExecutableNetwork class functionality:
- Compile an InferenceEngine::ICNNNetwork instance to a backend specific graph representation
- Create an arbitrary number of
InferRequest objects
- Hold some common resources shared between different instances of
InferRequest. For example:
- InferenceEngine::IExecutableNetworkInternal::_taskExecutor task executor to implement asynchronous execution
- InferenceEngine::IExecutableNetworkInternal::_callbackExecutor task executor to run an asynchronous inference request callback in a separate thread
ExecutableNetwork Class
Inference Engine Plugin API provides the helper InferenceEngine::ExecutableNetworkThreadSafeDefault class recommended to use as a base class for an executable network. Based on that, a declaration of an executable network class can look as follows:
public:
ExecutableNetwork(std::istream& model, const Configuration& cfg, const std::shared_ptr<Plugin>& plugin);
void Export(std::ostream& model)
override;
private:
friend class TemplateInferRequest;
void InitExecutor();
std::atomic<std::size_t> _requestId = {0};
Configuration _cfg;
std::shared_ptr<ngraph::Function> _function;
std::map<std::string, std::size_t> _inputIndex;
std::map<std::string, std::size_t> _outputIndex;
};
This class provides optimal thread safe default implementation. The class is recommended to be used a...
Definition: ie_executable_network_thread_safe_default.hpp:23
IInferRequestInternal::Ptr CreateInferRequest() override
Given optional implementation of creating asynchronous inference request to avoid need for it to be i...
Definition: ie_executable_network_thread_safe_default.hpp:50
virtual void Export(const std::string &modelFileName)
Export the current created executable network so it can be used later in the Import() main API.
virtual std::shared_ptr< IInferRequestInternal > CreateInferRequestImpl(InputsDataMap networkInputs, OutputsDataMap networkOutputs)
Creates an inference request internal implementation.
virtual Parameter GetConfig(const std::string &name) const
Gets configuration dedicated to plugin behaviour.
std::shared_ptr< IInferencePlugin > _plugin
A pointer to a IInferencePlugin interface.
Definition: ie_iexecutable_network_internal.hpp:150
virtual Parameter GetMetric(const std::string &name) const
Gets general runtime metric for dedicated hardware.
std::shared_ptr< IInferRequestInternal > Ptr
A shared pointer to a IInferRequestInternal interface.
Definition: ie_iinfer_request_internal.hpp:33
std::map< std::string, InputInfo::Ptr > InputsDataMap
std::map< std::string, DataPtr > OutputsDataMap
Class Fields
The example class has several fields:
_requestId - Tracks a number of created inference requests, which is used to distinguish different inference requests during profiling via the Intel® Instrumentation and Tracing Technology (ITT) library.
_cfg - Defines a configuration an executable network was compiled with.
_plugin - Refers to a plugin instance.
_function - Keeps a reference to transformed ngraph::Function which is used in ngraph reference backend computations. Note, in case of other backends with backend specific graph representation _function has different type and represents backend specific graph or just a set of computational kernels to perform an inference.
_inputIndex - maps a name of input with its index among all network inputs.
_outputIndex - maps a name of output with its index among all network outputs.
ExecutableNetwork Constructor with ICNNNetwork
This constructor accepts a generic representation of a neural network as an InferenceEngine::ICNNNetwork reference and is compiled into a backend specific device graph:
TemplatePlugin::ExecutableNetwork::ExecutableNetwork(const std::shared_ptr<const ngraph::Function>& function,
const Configuration& cfg, const Plugin::Ptr& plugin)
_cfg(cfg),
_plugin(plugin) {
try {
CompileNetwork(function, inputInfoMap, outputsInfoMap);
InitExecutor();
throw;
} catch (const std::exception& e) {
IE_THROW(Unexpected) <<
"Standard exception from compilation library: " << e.what();
} catch (...) {
IE_THROW(Unexpected) <<
"Generic exception is thrown";
}
}
Inference Engine Plugin API namespace.
The implementation CompileNetwork is fully device-specific.
CompileNetwork()
The function accepts a const shared pointer to ngraph::Function object and performs the following steps:
- Applies ngraph passes using
TransformNetwork function, which defines plugin-specific conversion pipeline.
- Maps the transformed graph to a backend specific graph representation (for example, to MKLDNN graph for Intel CPU).
- Allocates and fills memory for graph weights, backend specific memory handles and so on.
std::shared_ptr<ngraph::Function> TransformNetwork(
const std::shared_ptr<const ngraph::Function>&
function,
const InferenceEngine::InputsDataMap& inputInfoMap,
void TemplatePlugin::ExecutableNetwork::CompileNetwork(const std::shared_ptr<const ngraph::Function>& function,
_function = TransformNetwork(function, inputInfoMap, outputsInfoMap);
for (auto&& result : _function->get_results()) {
auto previousOutput = result->get_input_source_output(0);
auto outputName = previousOutput.get_node()->get_friendly_name();
if (previousOutput.get_node()->get_output_size() > 1) {
outputName += '.' + std::to_string(previousOutput.get_index());
}
_outputIndex.emplace(outputName, _function->get_result_index(result));
}
for (auto&& parameter : _function->get_parameters()) {
_inputIndex.emplace(parameter->get_friendly_name(), _function->get_parameter_index(parameter));
}
}
NOTE: After all these steps, the backend specific graph is ready to create inference requests and perform inference.
ExecutableNetwork Constructor Importing from Stream
This constructor creates a backend specific graph by importing from a stream object:
NOTE: The export of backend specific graph is done in the Export method, and data formats must be the same for both import and export.
TemplatePlugin::ExecutableNetwork::ExecutableNetwork(std::istream& model, const Configuration& cfg, const Plugin::Ptr& plugin): _cfg(cfg), _plugin(plugin) {
std::string xmlString;
std::uint64_t dataSize = 0;
model.read(reinterpret_cast<char*>(&dataSize), sizeof(dataSize));
xmlString.resize(dataSize);
model.read(const_cast<char*>(xmlString.c_str()), dataSize);
model.read(reinterpret_cast<char*>(&dataSize), sizeof(dataSize));
if (0 != dataSize) {
dataBlob = InferenceEngine::make_shared_blob<std::uint8_t>(
dataBlob->allocate();
model.read(dataBlob->buffer(), dataSize);
}
auto cnnnetwork = _plugin->GetCore()->ReadNetwork(xmlString, std::move(dataBlob));
setNetworkInputs(cnnnetwork.getInputsInfo());
setNetworkOutputs(cnnnetwork.getOutputsInfo());
SetPointerToPlugin(_plugin->shared_from_this());
try {
CompileNetwork(cnnnetwork.getFunction(), inputInfoMap, outputInfoMap);
InitExecutor();
throw;
} catch (const std::exception& e) {
IE_THROW(Unexpected) <<
"Standard exception from compilation library: " << e.what();
} catch (...) {
IE_THROW(Unexpected) <<
"Generic exception is thrown";
}
}
std::shared_ptr< Blob > Ptr
Export()
The implementation of the method should write all data to the model stream, which is required to import a backend specific graph later in the Plugin::Import method:
void TemplatePlugin::ExecutableNetwork::Export(std::ostream& modelStream) {
OV_ITT_SCOPED_TASK(itt::domains::TemplatePlugin, "ExecutableNetwork::Export");
std::map<std::string, ngraph::OpSet> custom_opsets;
std::stringstream xmlFile, binFile;
serializer.run_on_function(_function);
auto m_constants = binFile.str();
auto m_model = xmlFile.str();
auto dataSize = static_cast<std::uint64_t>(m_model.size());
modelStream.write(reinterpret_cast<char*>(&dataSize), sizeof(dataSize));
modelStream.write(m_model.c_str(), dataSize);
dataSize = static_cast<std::uint64_t>(m_constants.size());
modelStream.write(reinterpret_cast<char*>(&dataSize), sizeof(dataSize));
modelStream.write(reinterpret_cast<char*>(&m_constants[0]), dataSize);
}
Serialize transformation converts ngraph::Function into IR files.
Definition: serialize.hpp:30
CreateInferRequest()
The method creates an asynchronous inference request and returns it. While the public Inference Engine API has a single interface for inference request, which can be executed in synchronous and asynchronous modes, a plugin library implementation has two separate classes:
- Synchronous inference request, which defines pipeline stages and runs them synchronously in the
Infer method.
- Asynchronous inference request, which is a wrapper for a synchronous inference request and can run a pipeline asynchronously. Depending on a device pipeline structure, it can has one or several stages:
- For single-stage pipelines, there is no need to define this method and create a class derived from InferenceEngine::AsyncInferRequestThreadSafeDefault. For single stage pipelines, a default implementation of this method creates InferenceEngine::AsyncInferRequestThreadSafeDefault wrapping a synchronous inference request and runs it asynchronously in the
_taskExecutor executor.
- For pipelines with multiple stages, such as performing some preprocessing on host, uploading input data to a device, running inference on a device, or downloading and postprocessing output data, schedule stages on several task executors to achieve better device use and performance. You can do it by creating a sufficient number of inference requests running in parallel. In this case, device stages of different inference requests are overlapped with preprocessing and postprocessing stage giving better performance.
IMPORTANT: It is up to you to decide how many task executors you need to optimally execute a device pipeline.
auto internalRequest = CreateInferRequestImpl(_networkInputs, _networkOutputs);
return std::make_shared<TemplateAsyncInferRequest>(std::static_pointer_cast<TemplateInferRequest>(internalRequest), _taskExecutor, _plugin->_waitExecutor,
_callbackExecutor);
}
CreateInferRequestImpl()
This is a helper method used by CreateInferRequest to create a synchronous inference request, which is later wrapped with the asynchronous inference request class:
return std::make_shared<TemplateInferRequest>(networkInputs, networkOutputs, std::static_pointer_cast<ExecutableNetwork>(shared_from_this()));
}
GetMetric()
Returns a metric value for a metric with the name name. A metric is a static type of information about an executable network. Examples of metrics:
- EXEC_NETWORK_METRIC_KEY(NETWORK_NAME) - name of an executable network
- EXEC_NETWORK_METRIC_KEY(OPTIMAL_NUMBER_OF_INFER_REQUESTS) - heuristic to denote an optimal (or at least sub-optimal) number of inference requests needed to run asynchronously to use the current device fully
- Any other executable network metric specific for a particular device. Such metrics and possible values must be declared in a plugin configuration public header, for example,
template/template_config.hpp
std::vector<std::string> configKeys = {
CONFIG_KEY(DEVICE_ID),
CONFIG_KEY(PERF_COUNT), TEMPLATE_CONFIG_KEY(THROUGHPUT_STREAMS)};
for (auto&& configKey : streamExecutorConfigKeys) {
configKeys.emplace_back(configKey);
}
auto networkName = _function->get_friendly_name();
unsigned int value = _cfg._streamsExecutorConfig._streams;
} else {
IE_THROW() <<
"Unsupported ExecutableNetwork metric: " << name;
}
}
#define IE_SET_METRIC_RETURN(name,...)
Return metric value with specified name and arguments .... Example:
Definition: ie_metric_helpers.hpp:52
#define EXEC_NETWORK_METRIC_KEY(name)
Defines IStreamsExecutor configuration.
Definition: ie_istreams_executor.hpp:52
std::vector< std::string > SupportedKeys()
Supported Configuration keys.
The IE_SET_METRIC_RETURN helper macro sets metric value and checks that the actual metric type matches a type of the specified value.
GetConfig()
Returns a current value for a configuration key with the name name. The method extracts configuration values an executable network is compiled with.
This function is the only way to get configuration values when a network is imported and compiled by other developers and tools (for example, the Compile tool).
The next step in plugin library implementation is the Synchronous Inference Request class.