InferRequest
class functionality:
- Allocate input and output blobs needed for a backend-dependent network inference.
- Define functions for inference process stages (for example,
preprocess
, upload
, infer
, download
, postprocess
). These functions can later be used to define an execution pipeline during Asynchronous Inference Request implementation.
- Call inference stages one by one synchronously.
InferRequest
Class
Inference Engine Plugin API provides the helper InferenceEngine::InferRequestInternal class recommended to use as a base class for a synchronous inference request implementation. Based of that, a declaration of a synchronous request class can look as follows:
public:
typedef std::shared_ptr<TemplateInferRequest>
Ptr;
TemplateInferRequest(const InferenceEngine::InputsDataMap& networkInputs,
const InferenceEngine::OutputsDataMap& networkOutputs,
const std::shared_ptr<ExecutableNetwork>& executableNetwork);
~TemplateInferRequest() override;
void GetPerformanceCounts(std::map<std::string, InferenceEngine::InferenceEngineProfileInfo>& perfMap)
const override;
void inferPreprocess();
void startPipeline();
void waitPipeline();
void inferPostprocess();
private:
void allocateDeviceBuffers();
void allocateBlobs();
enum {
Preprocess,
Postprocess,
StartPipeline,
WaitPipeline,
numOfStages
};
std::shared_ptr<ExecutableNetwork> _executableNetwork;
std::array<openvino::itt::handle_t, numOfStages> _profilingTask;
std::array<std::chrono::duration<float, std::micro>, numOfStages> _durations;
InferenceEngine::BlobMap _networkInputBlobs;
InferenceEngine::BlobMap _networkOutputBlobs;
ngraph::ParameterVector _parameters;
ngraph::ResultVector _results;
std::vector<std::shared_ptr<ngraph::runtime::Tensor>> _inputTensors;
std::vector<std::shared_ptr<ngraph::runtime::Tensor>> _outputTensors;
std::shared_ptr<ngraph::runtime::Executable> _executable;
};
Class Fields
The example class has several fields:
_executableNetwork
- reference to an executable network instance. From this reference, an inference request instance can take a task executor, use counter for a number of created inference requests, and so on.
_profilingTask
- array of the std::array<InferenceEngine::ProfilingTask, numOfStages>
type. Defines names for pipeline stages. Used to profile an inference pipeline execution with the Intel® instrumentation and tracing technology (ITT).
_durations
- array of durations of each pipeline stage.
_networkInputBlobs
- input blob map.
_networkOutputBlobs
- output blob map.
_parameters
- ngraph::Function
parameter operations.
_results
- ngraph::Function
result operations.
- backend specific fields:
_inputTensors
- inputs tensors which wrap _networkInputBlobs
blobs. They are used as inputs to backend _executable
computational graph.
_outputTensors
- output tensors which wrap _networkOutputBlobs
blobs. They are used as outputs from backend _executable
computational graph.
_executable
- an executable object / backend computational graph.
InferRequest
Constructor
The constructor initializes helper fields and calls methods which allocate blobs:
TemplateInferRequest::TemplateInferRequest(const InferenceEngine::InputsDataMap& networkInputs,
const InferenceEngine::OutputsDataMap& networkOutputs,
const std::shared_ptr<TemplatePlugin::ExecutableNetwork>& executableNetwork) :
InferRequestInternal(networkInputs, networkOutputs),
_executableNetwork(executableNetwork) {
auto requestID = std::to_string(_executableNetwork->_requestId.fetch_add(1));
std::string name = _executableNetwork->_function->get_friendly_name() + "_Req" + requestID;
_profilingTask = {
openvino::itt::handle("Template" + std::to_string(_executableNetwork->_cfg.deviceId) + "_" + name + "_Preprocess"),
openvino::itt::handle("Template" + std::to_string(_executableNetwork->_cfg.deviceId) + "_" + name + "_Postprocess"),
openvino::itt::handle("Template" + std::to_string(_executableNetwork->_cfg.deviceId) + "_" + name + "_StartPipline"),
openvino::itt::handle("Template" + std::to_string(_executableNetwork->_cfg.deviceId) + "_" + name + "_WaitPipline"),
};
_executable = _executableNetwork->_plugin->_backend->compile(_executableNetwork->_function);
_parameters = _executableNetwork->_function->get_parameters();
_results = _executableNetwork->_function->get_results();
allocateDeviceBuffers();
allocateBlobs();
}
NOTE: Call InferenceEngine::CNNNetwork::getInputsInfo and InferenceEngine::CNNNetwork::getOutputsInfo to specify both layout and precision of blobs, which you can set with InferenceEngine::InferRequest::SetBlob and get with InferenceEngine::InferRequest::GetBlob. A plugin uses these hints to determine its internal layouts and precisions for input and output blobs if needed.
~InferRequest
Destructor
Decrements a number of created inference requests:
TemplateInferRequest::~TemplateInferRequest() {
_executableNetwork->_requestId--;
}
InferImpl()
Implementation details: Base InferRequestInternal class implements the public InferenceEngine::InferRequestInternal::Infer method as following:
- Checks blobs set by users
- Calls the
InferImpl
method defined in a derived class to call actual pipeline stages synchronously
void TemplateInferRequest::InferImpl() {
inferPreprocess();
startPipeline();
waitPipeline();
inferPostprocess();
}
1. inferPreprocess
Below is the code of the the inferPreprocess
method to demonstrate Inference Engine common preprocessing step handling:
void TemplateInferRequest::inferPreprocess() {
OV_ITT_SCOPED_TASK(itt::domains::TemplatePlugin, _profilingTask[Preprocess]);
auto start = Time::now();
InferRequestInternal::execDataPreprocessing(_inputs);
for (auto&& input : _inputs) {
auto inputBlob = input.second;
auto networkInput = _networkInputBlobs[input.first];
if (inputBlob->getTensorDesc().getPrecision() == networkInput->getTensorDesc().getPrecision()) {
networkInput = inputBlob;
} else {
blobCopy(inputBlob, networkInput);
}
auto index = _executableNetwork->_inputIndex[input.first];
const auto& parameter = _parameters[index];
const auto& parameterShape = parameter->get_shape();
const auto& parameterType = parameter->get_element_type();
_inputTensors[index] = _executableNetwork->_plugin->_backend->create_tensor(parameterType, parameterShape,
InferenceEngine::as<InferenceEngine::MemoryBlob>(networkInput)->rmap().as<void*>());
}
for (auto&& output : _outputs) {
auto outputBlob = output.second;
auto networkOutput = _networkOutputBlobs[output.first];
auto index = _executableNetwork->_outputIndex[output.first];
if (outputBlob->getTensorDesc().getPrecision() == networkOutput->getTensorDesc().getPrecision()) {
networkOutput = outputBlob;
}
const auto& result = _results[index];
const auto& resultShape = result->get_shape();
const auto& resultType = result->get_element_type();
_outputTensors[index] = _executableNetwork->_plugin->_backend->create_tensor(resultType, resultShape,
InferenceEngine::as<InferenceEngine::MemoryBlob>(networkOutput)->wmap().as<void*>());
}
_durations[Preprocess] = Time::now() - start;
}
Details:
InferImpl
must call the InferenceEngine::InferRequestInternal::execDataPreprocessing function, which executes common Inference Engine preprocessing step (for example, applies resize or color conversion operations) if it is set by the user. The output dimensions, layout and precision matches the input information set via InferenceEngine::CNNNetwork::getInputsInfo.
- If
inputBlob
passed by user differs in terms of precisions from precision expected by plugin, blobCopy
is performed which does actual precision conversion.
2. startPipeline
Executes a pipeline synchronously using _executable
object:
void TemplateInferRequest::startPipeline() {
OV_ITT_SCOPED_TASK(itt::domains::TemplatePlugin, _profilingTask[StartPipeline])
auto start = Time::now();
_executable->call(_outputTensors, _inputTensors);
_durations[StartPipeline] = Time::now() - start;
}
3. inferPostprocess
Converts output blobs if precisions of backend output blobs and blobs passed by user are different:
void TemplateInferRequest::inferPostprocess() {
OV_ITT_SCOPED_TASK(itt::domains::TemplatePlugin, _profilingTask[Postprocess]);
auto start = Time::now();
for (auto&& output : _outputs) {
auto outputBlob = output.second;
auto networkOutput = _networkOutputBlobs[output.first];
if (outputBlob->getTensorDesc().getPrecision() != networkOutput->getTensorDesc().getPrecision()) {
blobCopy(networkOutput, outputBlob);
}
}
_durations[Postprocess] = Time::now() - start;
}
GetPerformanceCounts()
The method sets performance counters which were measured during pipeline stages execution:
void TemplateInferRequest::GetPerformanceCounts(std::map<std::string, InferenceEngineProfileInfo> &perfMap) const {
InferenceEngineProfileInfo info;
info.execution_index = 0;
info.status = InferenceEngineProfileInfo::EXECUTED;
info.cpu_uSec = info.realTime_uSec = _durations[Preprocess].count();
perfMap["1. input preprocessing"] = info;
info.cpu_uSec = info.realTime_uSec = 0;
perfMap["2. input transfer to a device"] = info;
info.cpu_uSec = info.realTime_uSec = _durations[StartPipeline].count();
perfMap["3. execution time"] = info;
info.cpu_uSec = info.realTime_uSec = 0;
perfMap["4. output transfer from a device"] = info;
info.cpu_uSec = info.realTime_uSec = _durations[Postprocess].count();
perfMap["5. output postprocessing"] = info;
}
The next step in the plugin library implementation is the Asynchronous Inference Request class.