openvino.inference_engine.InferRequest

class openvino.inference_engine.InferRequest

This class provides an interface to infer requests of ExecutableNetwork and serves to handle infer requests execution and to set and get output data.

__init__()

There is no explicit class constructor. To make a valid InferRequest instance, use IECore.load_network() method of the IECore class with specified number of requests to get ExecutableNetwork instance which stores infer requests.

Methods

__init__

There is no explicit class constructor.

async_infer(self[, inputs])

Starts asynchronous inference of the infer request and fill outputs array

get_perf_counts(self)

Queries performance measures per layer to get feedback of what is the most time consuming layer.

infer(self[, inputs])

Starts synchronous inference of the infer request and fill outputs array

query_state(self)

Gets state control interface for given infer request State control essential for recurrent networks :return: A vector of Memory State objects

set_batch(self, size)

Sets new batch size for certain infer request when dynamic batching is enabled in executable network that created this request.

set_blob(self, unicode blob_name, Blob blob, …)

Sets user defined Blob for the infer request

set_completion_callback(self, py_callback[, …])

Description: Sets a callback function that is called on success or failure of an asynchronous request

wait(self[, timeout])

Waits for the result to become available.

Attributes

input_blobs

Dictionary that maps input layer names to corresponding Blobs

inputs

A dictionary that maps input layer names to numpy.ndarray objects of proper shape with input data for the layer

latency

Current infer request inference time in milliseconds

output_blobs

Dictionary that maps output layer names to corresponding Blobs

outputs

A dictionary that maps output layer names to numpy.ndarray objects with output data of the layer

preprocess_info

Dictionary that maps input layer names to corresponding preprocessing information

async_infer(self, inputs=None)

Starts asynchronous inference of the infer request and fill outputs array

Parameters

inputs – A dictionary that maps input layer names to numpy.ndarray objects of proper shape with input data for the layer

Returns

None

Usage example:

exec_net = ie_core.load_network(network=net, device_name="CPU", num_requests=2)
exec_net.requests[0].async_infer({input_blob: image})
request_status = exec_net.requests[0].wait()
res = exec_net.requests[0].output_blobs['prob']
get_perf_counts(self)

Queries performance measures per layer to get feedback of what is the most time consuming layer.

Note

Performance counters data and format depends on the plugin

Returns

Dictionary containing per-layer execution information.

Usage example:

exec_net = ie_core.load_network(network=net, device_name="CPU", num_requests=2)
exec_net.requests[0].infer({input_blob: image})
exec_net.requests[0].get_perf_counts()
#  {'Conv2D': {'exec_type': 'jit_avx2_1x1',
#              'real_time': 154,
#              'cpu_time': 154,
#              'status': 'EXECUTED',
#              'layer_type': 'Convolution'},
#   'Relu6':  {'exec_type': 'undef',
#              'real_time': 0,
#              'cpu_time': 0,
#              'status': 'NOT_RUN',
#              'layer_type': 'Clamp'}
#   ...
#  }
infer(self, inputs=None)

Starts synchronous inference of the infer request and fill outputs array

Parameters

inputs – A dictionary that maps input layer names to numpy.ndarray objects of proper shape with input data for the layer

Returns

None

Usage example:

exec_net = ie_core.load_network(network=net, device_name="CPU", num_requests=2)
exec_net.requests[0].infer({input_blob: image})
res = exec_net.requests[0].output_blobs['prob']
np.flip(np.sort(np.squeeze(res)),0)

# array([4.85416055e-01, 1.70385033e-01, 1.21873841e-01, 1.18894853e-01,
#         5.45198545e-02, 2.44456064e-02, 5.41366823e-03, 3.42589128e-03,
#         2.26027006e-03, 2.12283316e-03 ...])
input_blobs

Dictionary that maps input layer names to corresponding Blobs

inputs

A dictionary that maps input layer names to numpy.ndarray objects of proper shape with input data for the layer

latency

Current infer request inference time in milliseconds

output_blobs

Dictionary that maps output layer names to corresponding Blobs

outputs

A dictionary that maps output layer names to numpy.ndarray objects with output data of the layer

preprocess_info

Dictionary that maps input layer names to corresponding preprocessing information

query_state(self)

Gets state control interface for given infer request State control essential for recurrent networks :return: A vector of Memory State objects

set_batch(self, size)

Sets new batch size for certain infer request when dynamic batching is enabled in executable network that created this request.

Note

Support of dynamic batch size depends on the target plugin.

Parameters

size – New batch size to be used by all the following inference calls for this request

Returns

None

Usage example:

ie = IECore()
net = ie.read_network(model=path_to_xml_file, weights=path_to_bin_file)
# Set max batch size
# net.batch = 10
ie.set_config(config={"DYN_BATCH_ENABLED": "YES"}, device_name=device)
exec_net = ie.load_network(network=net, device_name=device)
# Set batch size for certain network.
# NOTE: Input data shape will not be changed, but will be used partially in inference which increases performance
exec_net.requests[0].set_batch(2)
set_blob(self, unicode blob_name: str, Blob blob: Blob, PreProcessInfo preprocess_info: PreProcessInfo = None)

Sets user defined Blob for the infer request

Parameters
  • blob_name – A name of input blob

  • blob – Blob object to set for the infer request

  • preprocess_info – PreProcessInfo object to set for the infer request.

Returns

None

Usage example:

ie = IECore()
net = IENetwork("./model.xml", "./model.bin")
exec_net = ie.load_network(net, "CPU", num_requests=2)
td = TensorDesc("FP32", (1, 3, 224, 224), "NCHW")
blob_data = np.ones(shape=(1, 3, 224, 224), dtype=np.float32)
blob = Blob(td, blob_data)
exec_net.requests[0].set_blob(blob_name="input_blob_name", blob=blob),
set_completion_callback(self, py_callback, py_data=None)

Description: Sets a callback function that is called on success or failure of an asynchronous request

Parameters
  • py_callback – Any defined or lambda function

  • py_data – Data that is passed to the callback function

Returns

None

Usage example:

callback = lambda status, py_data: print(f"Request with id {py_data} finished with status {status}")
ie = IECore()
net = ie.read_network(model="./model.xml", weights="./model.bin")
exec_net = ie.load_network(net, "CPU", num_requests=4)
for id, req in enumerate(exec_net.requests):
    req.set_completion_callback(py_callback=callback, py_data=id)

for req in exec_net.requests:
    req.async_infer({"data": img})
wait(self, timeout=None)

Waits for the result to become available. Blocks until specified timeout elapses or the result becomes available, whichever comes first.

Parameters

timeout – Time to wait in milliseconds or special (0, -1) cases described above. If not specified, timeout value is set to -1 by default.

Returns

Request status code.

Note

There are special values of the timeout parameter:

  • 0 - Immediately returns the inference status. It does not block or interrupt execution. To find statuses meaning, please refer to enum_InferenceEngine_StatusCode in Inference Engine C++ documentation

  • -1 - Waits until inference result becomes available (default value)

Usage example: See InferRequest.async_infer() method of the the InferRequest class.