openvino.inference_engine.InferRequest

class openvino.inference_engine.InferRequest

Bases: object

OpenVINO Inference Engine Python API is deprecated and will be removed in the 2024.0 release. For instructions on transitioning to the new API, please refer to https://docs.openvino.ai/latest/openvino_2_0_transition_guide.html

This class provides an interface to infer requests of ExecutableNetwork and serves to handle infer requests execution and to set and get output data.

__init__()

There is no explicit class constructor. To make a valid InferRequest instance, use IECore.load_network() method of the IECore class with specified number of requests to get ExecutableNetwork instance which stores infer requests.

Methods

__delattr__(name, /)

Implement delattr(self, name).

__dir__()

Default dir() implementation.

__eq__(value, /)

Return self==value.

__format__(format_spec, /)

Default object formatter.

__ge__(value, /)

Return self>=value.

__getattribute__(name, /)

Return getattr(self, name).

__gt__(value, /)

Return self>value.

__hash__()

Return hash(self).

__init__

There is no explicit class constructor.

__init_subclass__

This method is called when a class is subclassed.

__le__(value, /)

Return self<=value.

__lt__(value, /)

Return self<value.

__ne__(value, /)

Return self!=value.

__new__(**kwargs)

__reduce__

InferRequest.__reduce_cython__(self)

__reduce_ex__(protocol, /)

Helper for pickle.

__repr__()

Return repr(self).

__setattr__(name, value, /)

Implement setattr(self, name, value).

__setstate__

InferRequest.__setstate_cython__(self, __pyx_state)

__sizeof__()

Size of object in memory, in bytes.

__str__()

Return str(self).

__subclasshook__

Abstract classes can override this to customize issubclass().

_fill_inputs(self, inputs)

_get_blob_buffer(self, string blob_name)

async_infer(self[, inputs])

Starts asynchronous inference of the infer request and fill outputs array

get_perf_counts(self)

Queries performance measures per layer to get feedback of what is the most time consuming layer.

infer(self[, inputs])

Starts synchronous inference of the infer request and fill outputs array

query_state(self)

Gets state control interface for given infer request State control essential for recurrent networks :return: A vector of Memory State objects

set_blob(self, unicode blob_name, Blob blob)

Sets user defined Blob for the infer request

set_completion_callback(self, py_callback[, ...])

Description: Sets a callback function that is called on success or failure of an asynchronous request

wait(self[, timeout])

Waits for the result to become available.

Attributes

__pyx_vtable__

_inputs_list

_inputs_list: object

_outputs_list

_outputs_list: object

_py_callback

_py_callback: object

_py_data

_py_data: object

_user_blobs

_user_blobs: object

input_blobs

Dictionary that maps input layer names to corresponding Blobs

latency

Current infer request inference time in milliseconds

output_blobs

Dictionary that maps output layer names to corresponding Blobs

preprocess_info

Dictionary that maps input layer names to corresponding preprocessing information

__class__

alias of type

__delattr__(name, /)

Implement delattr(self, name).

__dir__()

Default dir() implementation.

__eq__(value, /)

Return self==value.

__format__(format_spec, /)

Default object formatter.

__ge__(value, /)

Return self>=value.

__getattribute__(name, /)

Return getattr(self, name).

__gt__(value, /)

Return self>value.

__hash__()

Return hash(self).

__init__()

There is no explicit class constructor. To make a valid InferRequest instance, use IECore.load_network() method of the IECore class with specified number of requests to get ExecutableNetwork instance which stores infer requests.

__init_subclass__()

This method is called when a class is subclassed.

The default implementation does nothing. It may be overridden to extend subclasses.

__le__(value, /)

Return self<=value.

__lt__(value, /)

Return self<value.

__ne__(value, /)

Return self!=value.

__new__(**kwargs)
__pyx_vtable__ = <capsule object NULL>
__reduce__()

InferRequest.__reduce_cython__(self)

__reduce_ex__(protocol, /)

Helper for pickle.

__repr__()

Return repr(self).

__setattr__(name, value, /)

Implement setattr(self, name, value).

__setstate__()

InferRequest.__setstate_cython__(self, __pyx_state)

__sizeof__()

Size of object in memory, in bytes.

__str__()

Return str(self).

__subclasshook__()

Abstract classes can override this to customize issubclass().

This is invoked early on by abc.ABCMeta.__subclasscheck__(). It should return True, False or NotImplemented. If it returns NotImplemented, the normal algorithm is used. Otherwise, it overrides the normal algorithm (and the outcome is cached).

_fill_inputs(self, inputs)
_get_blob_buffer(self, string blob_name) BlobBuffer
_inputs_list

_inputs_list: object

_outputs_list

_outputs_list: object

_py_callback

_py_callback: object

_py_data

_py_data: object

_user_blobs

_user_blobs: object

async_infer(self, inputs=None)

Starts asynchronous inference of the infer request and fill outputs array

Parameters

inputs – A dictionary that maps input layer names to numpy.ndarray objects of proper shape with input data for the layer

Returns

None

Usage example:

exec_net = ie_core.load_network(network=net, device_name="CPU", num_requests=2)
exec_net.requests[0].async_infer({input_blob: image})
request_status = exec_net.requests[0].wait()
res = exec_net.requests[0].output_blobs['prob']
get_perf_counts(self)

Queries performance measures per layer to get feedback of what is the most time consuming layer.

Note

Performance counters data and format depends on the plugin

Returns

Dictionary containing per-layer execution information.

Usage example:

exec_net = ie_core.load_network(network=net, device_name="CPU", num_requests=2)
exec_net.requests[0].infer({input_blob: image})
exec_net.requests[0].get_perf_counts()
#  {'Conv2D': {'exec_type': 'jit_avx2_1x1',
#              'real_time': 154,
#              'cpu_time': 154,
#              'status': 'EXECUTED',
#              'layer_type': 'Convolution'},
#   'Relu6':  {'exec_type': 'undef',
#              'real_time': 0,
#              'cpu_time': 0,
#              'status': 'NOT_RUN',
#              'layer_type': 'Clamp'}
#   ...
#  }
infer(self, inputs=None)

Starts synchronous inference of the infer request and fill outputs array

Parameters

inputs – A dictionary that maps input layer names to numpy.ndarray objects of proper shape with input data for the layer

Returns

None

Usage example:

exec_net = ie_core.load_network(network=net, device_name="CPU", num_requests=2)
exec_net.requests[0].infer({input_blob: image})
res = exec_net.requests[0].output_blobs['prob']
np.flip(np.sort(np.squeeze(res)),0)

# array([4.85416055e-01, 1.70385033e-01, 1.21873841e-01, 1.18894853e-01,
#         5.45198545e-02, 2.44456064e-02, 5.41366823e-03, 3.42589128e-03,
#         2.26027006e-03, 2.12283316e-03 ...])
input_blobs

Dictionary that maps input layer names to corresponding Blobs

latency

Current infer request inference time in milliseconds

output_blobs

Dictionary that maps output layer names to corresponding Blobs

preprocess_info

Dictionary that maps input layer names to corresponding preprocessing information

query_state(self)

Gets state control interface for given infer request State control essential for recurrent networks :return: A vector of Memory State objects

set_blob(self, unicode blob_name: str, Blob blob: Blob)

Sets user defined Blob for the infer request

Parameters
  • blob_name – A name of input blob

  • blob – Blob object to set for the infer request

  • preprocess_info – PreProcessInfo object to set for the infer request.

Returns

None

Usage example:

ie = IECore()
net = IENetwork("./model.xml", "./model.bin")
exec_net = ie.load_network(net, "CPU", num_requests=2)
td = TensorDesc("FP32", (1, 3, 224, 224), "NCHW")
blob_data = np.ones(shape=(1, 3, 224, 224), dtype=np.float32)
blob = Blob(td, blob_data)
exec_net.requests[0].set_blob(blob_name="input_blob_name", blob=blob),
set_completion_callback(self, py_callback, py_data=None)

Description: Sets a callback function that is called on success or failure of an asynchronous request

Parameters
  • py_callback – Any defined or lambda function

  • py_data – Data that is passed to the callback function

Returns

None

Usage example:

callback = lambda status, py_data: print(f"Request with id {py_data} finished with status {status}")
ie = IECore()
net = ie.read_network(model="./model.xml", weights="./model.bin")
exec_net = ie.load_network(net, "CPU", num_requests=4)
for id, req in enumerate(exec_net.requests):
    req.set_completion_callback(py_callback=callback, py_data=id)

for req in exec_net.requests:
    req.async_infer({"data": img})
wait(self, timeout=None)

Waits for the result to become available. Blocks until specified timeout elapses or the result becomes available, whichever comes first.

Parameters

timeout – Time to wait in milliseconds or special (0, -1) cases described above. If not specified, timeout value is set to -1 by default.

Returns

Request status code.

Note

There are special values of the timeout parameter:

  • 0 - Immediately returns the inference status. It does not block or interrupt execution. To find statuses meaning, please refer to enum_InferenceEngine_StatusCode in Inference Engine C++ documentation

  • -1 - Waits until inference result becomes available (default value)

Usage example: See InferRequest.async_infer() method of the the InferRequest class.