OpenVINO™ Inference Request#

To set up and run inference, use the ov::InferRequest class. It enables you to run inference on different devices either synchronously or asynchronously. It also includes methods to retrieve data or adjust data from model inputs and outputs.

The ov::InferRequest can be created from the ov::CompiledModel.

Python

infer_request = compiled_model.create_infer_request()

C++

auto infer_request = compiled_model.create_infer_request();

Working with Input and Output tensors#

ov::InferRequest enables you to get input/output tensors by tensor name, index, and port. Note that a similar logic is applied to retrieving data using the ov::Model methods.

get_input_tensor, set_input_tensor, get_output_tensor, set_output_tensor

for a model with only one input/output, no arguments are required
Python
input_tensor = infer_request.get_input_tensor() output_tensor = infer_request.get_output_tensor()
C++
auto input_tensor = infer_request.get_input_tensor(); auto output_tensor = infer_request.get_output_tensor();
to select a specific input/output tensor provide its index number as a parameter
Python
input_tensor = infer_request.get_input_tensor(0) output_tensor = infer_request.get_output_tensor(0)
C++
auto input_tensor = infer_request.get_input_tensor(0); auto output_tensor = infer_request.get_output_tensor(1);

ov::InferRequest::get_tensor, ov::InferRequest::set_tensor

to select an input/output tensor by tensor name, provide it as a parameter
Python
tensor1 = infer_request.get_tensor("result") tensor2 = ov.Tensor(ov.Type.f32, [1, 3, 32, 32]) infer_request.set_tensor(input_tensor_name, tensor2)
C++
auto tensor1 = infer_request.get_tensor("tensor_name1"); ov::Tensor tensor2; infer_request.set_tensor("tensor_name2", tensor2);
to select an input/output tensor by port
Python
input_port = model.input(0) output_port = model.input(input_tensor_name) input_tensor = ov.Tensor(ov.Type.f32, [1, 3, 32, 32]) infer_request.set_tensor(input_port, input_tensor) output_tensor = infer_request.get_tensor(output_port)
C++
auto input_port = model->input(0); auto output_port = model->output("tensor_name"); ov::Tensor input_tensor; infer_request.set_tensor(input_port, input_tensor); auto output_tensor = infer_request.get_tensor(output_port);

Infer Request Use Scenarios#

Cascade of Models#

ov::InferRequest can be used to organize a cascade of models. Infer Requests are required for each model. In this case, you can get the output tensor from the first request, using ov::InferRequest::get_tensor and set it as input for the second request, using ov::InferRequest::set_tensor. Keep in mind that tensors shared across compiled models can be rewritten by the first model if the first infer request is run once again, while the second model has not started yet.

Python

output = infer_request1.get_output_tensor(0)
infer_request2.set_input_tensor(0, output)

C++

auto output = infer_request1.get_output_tensor(0);
infer_request2.set_input_tensor(0, output);

Re-use shared input in several models (e.g. ROI Tensors)#

If a model processes data created by a different model in the same pipeline, you may be able to reuse the input, instead of allocating two separate input tensors. Just allocate memory for the first model input, and then reuse it for the second model, adjusting it if necessary. A good example is, when the first model detects objects in a video frame (stored as an input tensor), and the second model uses the generated Region of Interest (ROI) to perform additional operations. In this case, the second model may take the pre-allocated input and crop the frame to the size of the generated bounding boxes. In this case, use ov::Tensor with ov::Tensor and ov::Coordinate as parameters.

Python

# input_tensor points to input of a previous network and
# cropROI contains coordinates of output bounding box **/
input_tensor = ov.Tensor(type=ov.Type.f32, shape=ov.Shape([1, 3, 100, 100]))
begin = [0, 0, 0, 0]
end = [1, 3, 32, 32]
# ...

C++

/** input_tensor points to input of a previous network and
    cropROI contains coordinates of output bounding box **/
ov::Tensor input_tensor(ov::element::f32, ov::Shape({1, 3, 20, 20}));
ov::Coordinate begin({0, 0, 0, 0});
ov::Coordinate end({1, 2, 3, 3});
//...

Using Remote Tensors#

By using ov::RemoteContext you can create a remote tensor to work with remote device memory.

Python

# NOT SUPPORTED

C++

ov::RemoteContext context = core.get_default_context("GPU");
auto input_port = compiled_model.input("tensor_name");

OpenVINO™ Inference Request#

Synchronous / asynchronous inference#

Working with Input and Output tensors#

Infer Request Use Scenarios#

Cascade of Models#

Re-use shared input in several models (e.g. ROI Tensors)#

Using Remote Tensors#