Running and Integrating Inference Pipeline#
OpenVINO Runtime is a set of C++ libraries with C and Python bindings, providing a common API to run inference on various devices. Each device (integrated with OpenVINO’s plugin architecture) offers the common, as well as hardware-specific APIs for more configuration options. Note that OpenVINO Runtime may also be integrated with other frameworks and work as their backend, for example, using torch.compile. The scheme below illustrates the typical workflow for deploying a trained deep learning model in an application:
This guide will show you how to implement a typical OpenVINO™ Runtime inference pipeline
in your application. Before proceeding, check how
model conversion
works in OpenVINO and how it may affect your applications’ performance. Make sure you have
installed OpenVINO Runtime and set environment variables (otherwise, the find_package
calls will not find OpenVINO_DIR):
<INSTALL_DIR>/setupvars.sh
PowerShell:
<INSTALL_DIR>/setupvars.sh
Command Prompt
cd <INSTALL_DIR>
setupvars.bat
Step 1. Create OpenVINO Runtime Core#
Initiate working with OpenVINO in your application by including the OpenVINO™ Runtime components:
import openvino as ov
core = ov.Core()
#include <openvino/openvino.hpp>
ov::Core core;
#include <openvino/c/openvino.h>
ov_core_t* core = NULL;
ov_core_create(&core);
Step 2. Compile the Model#
Compile the model with ov::Core::compile_model()
, defining the device or mode to use
for inference. The following example uses the
AUTO mode,
which selects the device for you. To learn more about supported devices and inference modes,
see the Inference Devices and Modes
section.
compiled_model = core.compile_model("model.xml", "AUTO")
compiled_model = core.compile_model("model.onnx", "AUTO")
compiled_model = core.compile_model("model.pdmodel", "AUTO")
compiled_model = core.compile_model("model.pb", "AUTO")
compiled_model = core.compile_model("model.tflite", "AUTO")
def create_model():
# This example shows how to create ov::Function
#
# To construct a model, please follow
# https://docs.openvino.ai/2025/openvino-workflow/running-inference/model-representation.html
data = ov.opset8.parameter([3, 1, 2], ov.Type.f32)
res = ov.opset8.result(data)
return ov.Model([res], [data], "model")
model = create_model()
compiled_model = core.compile_model(model, "AUTO")
ov::CompiledModel compiled_model = core.compile_model("model.xml", "AUTO");
ov::CompiledModel compiled_model = core.compile_model("model.onnx", "AUTO");
ov::CompiledModel compiled_model = core.compile_model("model.pdmodel", "AUTO");
ov::CompiledModel compiled_model = core.compile_model("model.pb", "AUTO");
ov::CompiledModel compiled_model = core.compile_model("model.tflite", "AUTO");
auto create_model = []() {
std::shared_ptr<ov::Model> model;
// To construct a model, please follow
// https://docs.openvino.ai/2025/openvino-workflow/running-inference/model-representation.html
return model;
};
std::shared_ptr<ov::Model> model = create_model();
compiled_model = core.compile_model(model, "AUTO");
ov_compiled_model_t* compiled_model = NULL;
ov_core_compile_model_from_file(core, "model.xml", "AUTO", 0, &compiled_model);
ov_compiled_model_t* compiled_model = NULL;
ov_core_compile_model_from_file(core, "model.onnx", "AUTO", 0, &compiled_model);
ov_compiled_model_t* compiled_model = NULL;
ov_core_compile_model_from_file(core, "model.pdmodel", "AUTO", 0, &compiled_model);
ov_compiled_model_t* compiled_model = NULL;
ov_core_compile_model_from_file(core, "model.pb", "AUTO", 0, &compiled_model);
ov_compiled_model_t* compiled_model = NULL;
ov_core_compile_model_from_file(core, "model.tflite", "AUTO", 0, &compiled_model);
// Construct a model
ov_model_t* model = NULL;
ov_core_read_model(core, "model.xml", NULL, &model);
ov_compiled_model_t* compiled_model = NULL;
ov_core_compile_model(core, model, "AUTO", 0, &compiled_model);
The ov::CompiledModel
class represents a compiled model and enables you to get
information inputs or output ports by a tensor name or index. This approach is aligned with
most frameworks. The ov::Model
object represents any models inside the OpenVINO™ Runtime.
For more details, refer to
OpenVINO™ Model representation.
Step 3. Create an Inference Request#
Use the ov::InferRequest
class methods to create an infer request. For more details,
see the article on
InferRequest.
infer_request = compiled_model.create_infer_request()
ov::InferRequest infer_request = compiled_model.create_infer_request();
ov_infer_request_t* infer_request = NULL;
ov_compiled_model_create_infer_request(compiled_model, &infer_request);
Step 4. Set Inputs#
Create ov::Tensor
, you can use external memory for that , and use the
ov::InferRequest::set_input_tensor
method to send this tensor to the device.
For more info on textual data as input, see the
String Tensors article.
# Create tensor from external memory
input_tensor = ov.Tensor(array=memory, shared_memory=True)
# Set input tensor for model with one input
infer_request.set_input_tensor(input_tensor)
// Get input port for model with one input
auto input_port = compiled_model.input();
// Create tensor from external memory
ov::Tensor input_tensor(input_port.get_element_type(), input_port.get_shape(), memory_ptr);
// Set input tensor for model with one input
infer_request.set_input_tensor(input_tensor);
// Get input port for model with one input
ov_output_const_port_t* input_port = NULL;
ov_compiled_model_input(compiled_model, &input_port);
// Get the input shape from input port
ov_shape_t input_shape;
ov_const_port_get_shape(input_port, &input_shape);
// Get the the type of input
ov_element_type_e input_type;
ov_port_get_element_type(input_port, &input_type);
// Create tensor from external memory
ov_tensor_t* tensor = NULL;
ov_tensor_create_from_host_ptr(input_type, input_shape, memory_ptr, &tensor);
// Set input tensor for model with one input
ov_infer_request_set_input_tensor(infer_request, tensor);
Step 5. Start Inference#
Use either ov::InferRequest::start_async
or ov::infer_request.infer()
to start model
inference. To learn how they work, see the
OpenVINO Inference Request
article. The following example uses the asynchronous option and calls
ov::InferRequest::wait
to wait for the inference results.
infer_request.start_async()
infer_request.wait()
infer_request.start_async();
infer_request.wait();
ov_infer_request_start_async(infer_request);
ov_infer_request_wait(infer_request);
Step 6. Process the Inference Results#
Get output tensors and process the inference results. For more info on textual data as input, see the String Tensors article.
# Get output tensor for model with one output
output = infer_request.get_output_tensor()
output_buffer = output.data
# output_buffer[] - accessing output tensor data
// Get output tensor by tensor name
auto output = infer_request.get_tensor("tensor_name");
const float *output_buffer = output.data<const float>();
// output_buffer[] - accessing output tensor data
ov_tensor_t* output_tensor = NULL;
// Get output tensor by tensor index
ov_infer_request_get_output_tensor_by_index(infer_request, 0, &output_tensor);
Step 7. [only for C] Release the allocated objects#
To avoid memory leak, applications developed with the C API need to release the allocated objects in the following order.
ov_shape_free(&input_shape);
ov_tensor_free(output_tensor);
ov_output_const_port_free(input_port);
ov_tensor_free(tensor);
ov_infer_request_free(infer_request);
ov_compiled_model_free(compiled_model);
ov_model_free(model);
ov_core_free(core);
Build Your Application#
If you have integrated OpenVINO with your application, you will need to adjust your application build process as well. Of course, there are multiple ways this stage may be done, so you will need to choose the one best for your project. To learn about the basics of OpenVINO build process, refer to the documentation on GitHub.
The following example uses a C++ & C application together with CMake, for project configuration.
Create Structure for project:
project/ ├── CMakeLists.txt - CMake file to build ├── ... - Additional folders like includes/ └── src/ - source folder └── main.cpp build/ - build directory ...
project/ ├── CMakeLists.txt - CMake file to build ├── ... - Additional folders like includes/ └── src/ - source folder └── main.c build/ - build directory ...
Configure the CMake build
For details on additional CMake build options, refer to the CMake page.
cmake_minimum_required(VERSION 3.10) set(CMAKE_CXX_STANDARD 17) find_package(OpenVINO REQUIRED) add_executable(${TARGET_NAME} src/main.cpp) target_link_libraries(${TARGET_NAME} PRIVATE openvino::runtime)
cmake_minimum_required(VERSION 3.10) set(CMAKE_CXX_STANDARD 17) find_package(OpenVINO REQUIRED) add_executable(${TARGET_NAME_C} src/main.c) target_link_libraries(${TARGET_NAME_C} PRIVATE openvino::runtime::c)
cmake_minimum_required(VERSION 3.10) set(CMAKE_CXX_STANDARD 17) if(NOT CMAKE_CROSSCOMPILING) find_package(Python3 QUIET COMPONENTS Interpreter) if(Python3_Interpreter_FOUND) execute_process( COMMAND ${Python3_EXECUTABLE} -c "from openvino.utils import get_cmake_path; print(get_cmake_path(), end='')" OUTPUT_VARIABLE OpenVINO_DIR_PY ERROR_QUIET) endif() endif() find_package(OpenVINO REQUIRED PATHS "${OpenVINO_DIR_PY}") add_executable(${TARGET_NAME_PY} src/main.cpp) target_link_libraries(${TARGET_NAME_PY} PRIVATE openvino::runtime)
Build Project
Use CMake to build the project on your system:
cd build/ cmake ../project cmake --build .
Additional Resources#
To see working implementation of the steps, check out the Learn OpenVINO section, including OpenVINO™ Runtime API Tutorial.
Models in the OpenVINO IR format on Hugging Face.