Running and Integrating Inference Pipeline#

OpenVINO Runtime is a set of C++ libraries with C and Python bindings, providing a common API to run inference on various devices. Each device (integrated with OpenVINO’s plugin architecture) offers the common, as well as hardware-specific APIs for more configuration options. Note that OpenVINO Runtime may also be integrated with other frameworks and work as their backend, for example, using torch.compile. The scheme below illustrates the typical workflow for deploying a trained deep learning model in an application:

../_images/IMPLEMENT_PIPELINE_with_API_C.svg

This guide will show you how to implement a typical OpenVINO™ Runtime inference pipeline in your application. Before proceeding, check how model conversion works in OpenVINO and how it may affect your applications’ performance. Make sure you have installed OpenVINO Runtime and set environment variables (otherwise, the find_package calls will not find OpenVINO_DIR):

Linux

<INSTALL_DIR>/setupvars.sh

Windows

PowerShell:

<INSTALL_DIR>/setupvars.sh

Command Prompt

cd  <INSTALL_DIR>
setupvars.bat

Step 1. Create OpenVINO Runtime Core#

Initiate working with OpenVINO in your application by including the OpenVINO™ Runtime components:

Python

import openvino as ov

core = ov.Core()

C++

#include <openvino/openvino.hpp>

ov::Core core;

C

#include <openvino/c/openvino.h>

ov_core_t* core = NULL;
ov_core_create(&core);

Step 3. Create an Inference Request#

Use the ov::InferRequest class methods to create an infer request. For more details, see the article on InferRequest.

Python

infer_request = compiled_model.create_infer_request()

C++

ov::InferRequest infer_request = compiled_model.create_infer_request();

C

ov_infer_request_t* infer_request = NULL;
ov_compiled_model_create_infer_request(compiled_model, &infer_request);

Step 4. Set Inputs#

Create ov::Tensor, you can use external memory for that , and use the ov::InferRequest::set_input_tensor method to send this tensor to the device. For more info on textual data as input, see the String Tensors article.

Python

# Create tensor from external memory
input_tensor = ov.Tensor(array=memory, shared_memory=True)
# Set input tensor for model with one input
infer_request.set_input_tensor(input_tensor)

C++

// Get input port for model with one input
auto input_port = compiled_model.input();
// Create tensor from external memory
ov::Tensor input_tensor(input_port.get_element_type(), input_port.get_shape(), memory_ptr);
// Set input tensor for model with one input
infer_request.set_input_tensor(input_tensor);

C

// Get input port for model with one input
ov_output_const_port_t* input_port = NULL;
ov_compiled_model_input(compiled_model, &input_port);
// Get the input shape from input port
ov_shape_t input_shape;
ov_const_port_get_shape(input_port, &input_shape);
// Get the the type of input
ov_element_type_e input_type;
ov_port_get_element_type(input_port, &input_type);
// Create tensor from external memory
ov_tensor_t* tensor = NULL;
ov_tensor_create_from_host_ptr(input_type, input_shape, memory_ptr, &tensor);
// Set input tensor for model with one input
ov_infer_request_set_input_tensor(infer_request, tensor);

Step 5. Start Inference#

Use either ov::InferRequest::start_async or ov::infer_request.infer() to start model inference. To learn how they work, see the OpenVINO Inference Request article. The following example uses the asynchronous option and calls ov::InferRequest::wait to wait for the inference results.

Python

infer_request.start_async()
infer_request.wait()

C++

infer_request.start_async();
infer_request.wait();

C

ov_infer_request_start_async(infer_request);
ov_infer_request_wait(infer_request);

Step 6. Process the Inference Results#

Get output tensors and process the inference results. For more info on textual data as input, see the String Tensors article.

Python

# Get output tensor for model with one output
output = infer_request.get_output_tensor()
output_buffer = output.data
# output_buffer[] - accessing output tensor data

C++

// Get output tensor by tensor name
auto output = infer_request.get_tensor("tensor_name");
const float *output_buffer = output.data<const float>();
// output_buffer[] - accessing output tensor data

C

ov_tensor_t* output_tensor = NULL;
// Get output tensor by tensor index
ov_infer_request_get_output_tensor_by_index(infer_request, 0, &output_tensor);

Step 7. [only for C] Release the allocated objects#

To avoid memory leak, applications developed with the C API need to release the allocated objects in the following order.

C

ov_shape_free(&input_shape);
ov_tensor_free(output_tensor);
ov_output_const_port_free(input_port);
ov_tensor_free(tensor);
ov_infer_request_free(infer_request);
ov_compiled_model_free(compiled_model);
ov_model_free(model);
ov_core_free(core);

Build Your Application#

If you have integrated OpenVINO with your application, you will need to adjust your application build process as well. Of course, there are multiple ways this stage may be done, so you will need to choose the one best for your project. To learn about the basics of OpenVINO build process, refer to the documentation on GitHub.

The following example uses a C++ & C application together with CMake, for project configuration.

Create Structure for project:

C++

project/
   ├── CMakeLists.txt  - CMake file to build
   ├── ...             - Additional folders like includes/
   └── src/            - source folder
       └── main.cpp
build/                  - build directory
   ...

C

project/
   ├── CMakeLists.txt  - CMake file to build
   ├── ...             - Additional folders like includes/
   └── src/            - source folder
       └── main.c
build/                  - build directory
   ...

Configure the CMake build

For details on additional CMake build options, refer to the CMake page.

C++

cmake_minimum_required(VERSION 3.10)
set(CMAKE_CXX_STANDARD 17)

find_package(OpenVINO REQUIRED)

add_executable(${TARGET_NAME} src/main.cpp)

target_link_libraries(${TARGET_NAME} PRIVATE openvino::runtime)

C

cmake_minimum_required(VERSION 3.10)
set(CMAKE_CXX_STANDARD 17)

find_package(OpenVINO REQUIRED)

add_executable(${TARGET_NAME_C} src/main.c)

target_link_libraries(${TARGET_NAME_C} PRIVATE openvino::runtime::c)

C++ (PyPI)

cmake_minimum_required(VERSION 3.10)
set(CMAKE_CXX_STANDARD 17)

if(NOT CMAKE_CROSSCOMPILING)
    find_package(Python3 QUIET COMPONENTS Interpreter)
    if(Python3_Interpreter_FOUND)
        execute_process(
            COMMAND ${Python3_EXECUTABLE} -c "from openvino.utils import get_cmake_path; print(get_cmake_path(), end='')"
            OUTPUT_VARIABLE OpenVINO_DIR_PY
            ERROR_QUIET)
    endif()
endif()

find_package(OpenVINO REQUIRED PATHS "${OpenVINO_DIR_PY}")

add_executable(${TARGET_NAME_PY} src/main.cpp)

target_link_libraries(${TARGET_NAME_PY} PRIVATE openvino::runtime)

Build Project

Use CMake to build the project on your system:
```
cd build/
cmake ../project
cmake --build .
```

Additional Resources#

To see working implementation of the steps, check out the Learn OpenVINO section, including OpenVINO™ Runtime API Tutorial.
Models in the OpenVINO IR format on Hugging Face.
Using Encrypted Models with OpenVINO

Running and Integrating Inference Pipeline#

Step 1. Create OpenVINO Runtime Core#

Step 2. Compile the Model#

Step 3. Create an Inference Request#

Step 4. Set Inputs#

Step 5. Start Inference#

Step 6. Process the Inference Results#

Step 7. [only for C] Release the allocated objects#

Build Your Application#

Additional Resources#