OpenVINO™ Inference Request#

OpenVINO™ Runtime uses Infer Request mechanism which allows running models on different devices in asynchronous or synchronous manners. The ov::InferRequest class is used for this purpose inside the OpenVINO™ Runtime. This class allows you to set and get data for model inputs, outputs and run inference for the model.

Creating Infer Request#

The ov::InferRequest can be created from the ov::CompiledModel:

Python

infer_request = compiled_model.create_infer_request()

C++

auto infer_request = compiled_model.create_infer_request();

Working with Input and Output tensors#

ov::InferRequest allows you to get input/output tensors by tensor name, index, port, and without any arguments, if a model has only one input or output.

ov::InferRequest::get_input_tensor, ov::InferRequest::set_input_tensor, ov::InferRequest::get_output_tensor, ov::InferRequest::set_output_tensor methods without arguments can be used to get or set input/output tensor for a model with only one input/output:
Python
input_tensor = infer_request.get_input_tensor() output_tensor = infer_request.get_output_tensor()
C++
auto input_tensor = infer_request.get_input_tensor(); auto output_tensor = infer_request.get_output_tensor();
ov::InferRequest::get_input_tensor, ov::InferRequest::set_input_tensor, ov::InferRequest::get_output_tensor, ov::InferRequest::set_output_tensor methods with argument can be used to get or set input/output tensor by input/output index:
Python
input_tensor = infer_request.get_input_tensor(0) output_tensor = infer_request.get_output_tensor(0)
C++
auto input_tensor = infer_request.get_input_tensor(0); auto output_tensor = infer_request.get_output_tensor(1);

ov::InferRequest::get_tensor, ov::InferRequest::set_tensor methods can be used to get or set input/output tensor by tensor name:

Python

tensor1 = infer_request.get_tensor("result")
tensor2 = ov.Tensor(ov.Type.f32, [1, 3, 32, 32])
infer_request.set_tensor(input_tensor_name, tensor2)

C++

auto tensor1 = infer_request.get_tensor("tensor_name1");
ov::Tensor tensor2;
infer_request.set_tensor("tensor_name2", tensor2);

ov::InferRequest::get_tensor, ov::InferRequest::set_tensor methods can be used to get or set input/output tensor by port:

Python

input_port = model.input(0)
output_port = model.input(input_tensor_name)
input_tensor = ov.Tensor(ov.Type.f32, [1, 3, 32, 32])
infer_request.set_tensor(input_port, input_tensor)
output_tensor = infer_request.get_tensor(output_port)

C++

auto input_port = model->input(0);
auto output_port = model->output("tensor_name");
ov::Tensor input_tensor;
infer_request.set_tensor(input_port, input_tensor);
auto output_tensor = infer_request.get_tensor(output_port);

Examples of Infer Request Usages#

Presented below are examples of what the Infer Request can be used for.

Cascade of Models#

ov::InferRequest can be used to organize a cascade of models. Infer Requests are required for each model. In this case, you can get the output tensor from the first request, using ov::InferRequest::get_tensor and set it as input for the second request, using ov::InferRequest::set_tensor. Keep in mind that tensors shared across compiled models can be rewritten by the first model if the first infer request is run once again, while the second model has not started yet.

Python

output = infer_request1.get_output_tensor(0)
infer_request2.set_input_tensor(0, output)

C++

auto output = infer_request1.get_output_tensor(0);
infer_request2.set_input_tensor(0, output);

Using of ROI Tensors#

It is possible to re-use shared input in several models. You do not need to allocate a separate input tensor for a model if it processes a ROI object located inside of an already allocated input of a previous model. For instance, when the first model detects objects in a video frame (stored as an input tensor) and the second model accepts detected bounding boxes (ROI inside of the frame) as input. In this case, it is allowed to re-use a pre-allocated input tensor (used by the first model) by the second model and just crop ROI without allocation of new memory, using ov::Tensor with passing ov::Tensor and ov::Coordinate as parameters.

Python

# input_tensor points to input of a previous network and
# cropROI contains coordinates of output bounding box **/
input_tensor = ov.Tensor(type=ov.Type.f32, shape=ov.Shape([1, 3, 100, 100]))
begin = [0, 0, 0, 0]
end = [1, 3, 32, 32]
# ...

C++

/** input_tensor points to input of a previous network and
    cropROI contains coordinates of output bounding box **/
ov::Tensor input_tensor(ov::element::f32, ov::Shape({1, 3, 20, 20}));
ov::Coordinate begin({0, 0, 0, 0});
ov::Coordinate end({1, 2, 3, 3});
//...

Using Remote Tensors#

By using ov::RemoteContext you can create a remote tensor to work with remote device memory.

Python

# NOT SUPPORTED

C++

ov::RemoteContext context = core.get_default_context("GPU");
auto input_port = compiled_model.input("tensor_name");

OpenVINO™ Inference Request#

Creating Infer Request#

Run Inference#

Synchronous Mode#

Asynchronous Mode#

Working with Input and Output tensors#

Examples of Infer Request Usages#

Cascade of Models#

Using of ROI Tensors#

Using Remote Tensors#