This section provides a high-level description of the process of integrating the Inference Engine into your application. Refer to the Hello Classification Sample sources for example of using the Inference Engine in applications.
NOTE: For 2019 R2 Release, the new Inference Engine Core API is introduced. This guide is updated to reflect the new API approach. The Inference Engine Plugin API is still supported, but is going to be deprecated in future releases. Please, refer to Migration from Inference Engine Plugin API to Core API guide to update your application.
The core libinference_engine.so
library implements loading and parsing a model Intermediate Representation (IR), and triggers inference using a specified device. The core library has the following API:
InferenceEngine::Core
InferenceEngine::Blob
, InferenceEngine::TBlob
, InferenceEngine::NV12Blob
InferenceEngine::BlobMap
InferenceEngine::InputsDataMap
, InferenceEngine::InputInfo
,InferenceEngine::OutputsDataMap
C++ Inference Engine API wraps the capabilities of core library:
InferenceEngine::CNNNetReader
InferenceEngine::CNNNetwork
InferenceEngine::ExecutableNetwork
InferenceEngine::InferRequest
Integration process includes the following steps:
1) Create Inference Engine Core to manage available devices and their plugins internally.
2) Create an IR reader by creating an instance of InferenceEngine::CNNNetReader
and read a model IR created by the Model Optimizer:
3) Configure input and output. Request input and output information using InferenceEngine::CNNNetReader::getNetwork()
, InferenceEngine::CNNNetwork::getInputsInfo()
, and InferenceEngine::CNNNetwork::getOutputsInfo()
methods:
Optionally, set the number format (precision) and memory layout for inputs and outputs. Refer to the Supported configurations chapter to choose the relevant configuration.
You can also allow input of any size. To do this, mark each input as resizable by setting a desired resize algorithm (e.g. BILINEAR
) inside of the appropriate input info.
Basic color format conversions are supported as well. By default, the Inference Engine assumes that the input color format is BGR
and color format conversions are disabled. The Inference Engine supports the following color format conversions:
RGB->BGR
RGBX->BGR
BGRX->BGR
NV12->BGR
where X
is a channel that will be ignored during inference. To enable the conversions, set a desired color format (for example, RGB
) for each input inside of the appropriate input info.
If you want to run inference for multiple images at once, you can use the built-in batch pre-processing functionality.
NOTE: Batch pre-processing is not supported if input color format is set to
ColorFormat::NV12
.
You can use the following code snippet to configure input and output:
NOTE: NV12 input color format pre-processing differs from other color conversions. In case of NV12, Inference Engine expects two separate image planes (Y and UV). You must use a specific
InferenceEngine::NV12Blob
object instead of default blob object and set this blob to the Inference Engine Infer Request usingInferenceEngine::InferRequest::SetBlob()
. Refer to Hello NV12 Input Classification C++ Sample for more details.
If you skip this step, the default values are set:
ColorFormat::RAW
meaning that input does not need color conversionsPrecision::FP32
Layout::NCHW
Number of dimensions | 5 | 4 | 3 | 2 | 1 |
---|---|---|---|---|---|
Layout | NCDHW | NCHW | CHW | NC | C |
4) Load the model to the device using InferenceEngine::Core::LoadNetwork()
:
It creates an executable network from a network object. The executable network is associated with single hardware device. It is possible to create as many networks as needed and to use them simultaneously (up to the limitation of the hardware resources). Second parameter is a configuration for plugin. It is map of pairs: (parameter name, parameter value). Choose device from Supported devices page for more details about supported configuration parameters.
5) Create an infer request:
6) Prepare input. You can use one of the following options to prepare input:
InferenceEngine::InferRequest::GetBlob()
and feed an image and the input data to the blobs. In this case, input data must be aligned (resized manually) with a given blob size and have a correct color format. InferenceEngine::InferRequest::GetBlob()
and set it as input for the second request using InferenceEngine::InferRequest::SetBlob()
. InferenceEngine::make_shared_blob()
with passing of InferenceEngine::Blob::Ptr
and InferenceEngine::ROI
as parameters. InferenceEngine::InferRequest::SetBlob()
to set these blobs for an infer request: SetBlob()
.NOTE:
SetBlob()
method compares precision and layout of an input blob with ones defined on step 3 and throws an exception if they do not match. It also compares a size of the input blob with input size of the read network. But if input was configured as resizable, you can set an input blob of any size (for example, any ROI blob). Input resize will be invoked automatically using resize algorithm configured on step 3. Similarly to the resize, color format conversions allow the color format of an input blob to differ from the color format of the read network. Color format conversion will be invoked automatically using color format configured on step 3.GetBlob()
logic is the same for pre-processable and not pre-processable input. Even if it is called with input configured as resizable or as having specific color format, a blob allocated by an infer request is returned. Its size and color format are already consistent with the corresponding values of the read network. No pre-processing will happen for this blob. If you callGetBlob()
afterSetBlob()
, you will get the blob you set inSetBlob()
.
7) Do inference by calling the InferenceEngine::InferRequest::StartAsync
and InferenceEngine::InferRequest::Wait
methods for asynchronous request:
or by calling the InferenceEngine::InferRequest::Infer
method for synchronous request:
StartAsync
returns immediately and starts inference without blocking main thread, Infer
blocks main thread and returns when inference is completed. Call Wait
for waiting result to become available for asynchronous request.
There are three ways to use it:
InferenceEngine::IInferRequest::WaitMode::RESULT_READY
- waits until inference result becomes availableInferenceEngine::IInferRequest::WaitMode::STATUS_ONLY
- immediately returns request status.It does not block or interrupts current thread.Both requests are thread-safe: can be called from different threads without fearing corruption and failures.
Multiple requests for single ExecutableNetwork
are executed sequentially one by one in FIFO order.
While request is ongoing, all its methods except InferenceEngine::InferRequest::Wait
would throw an exception.
8) Go over the output blobs and process the results. Note that casting Blob
to TBlob
via std::dynamic_pointer_cast
is not recommended way, better to access data via buffer()
and as()
methods as follows:
For details about building your application, refer to the CMake files for the sample applications. All samples reside in the samples directory in the Inference Engine installation directory.
Before running compiled binary files, make sure your application can find the Inference Engine libraries. On Linux* operating systems, including Ubuntu* and CentOS*, the LD_LIBRARY_PATH
environment variable is usually used to specify directories to be looked for libraries. You can update the LD_LIBRARY_PATH
with paths to the directories in the Inference Engine installation directory where the libraries reside.
Add a path the directory containing the core and plugin libraries:
Add paths the directories containing the required third-party libraries:
Alternatively, you can use the following scripts that reside in the Inference Engine directory of the OpenVINO™ toolkit and Intel® Deep Learning Deployment Toolkit installation folders respectively:
To run compiled applications on Microsoft* Windows* OS, make sure that Microsoft* Visual C++ 2015 Redistributable and Intel® C++ Compiler 2017 Redistributable packages are installed and <INSTALL_DIR>/bin/intel64/Release/*.dll
files are placed to the application folder or accessible via PATH%
environment variable.