openvino.inference_engine.InferRequest¶

class openvino.inference_engine.InferRequest¶

Bases: object

This class provides an interface to infer requests of ExecutableNetwork and serves to handle infer requests execution and to set and get output data.

__init__()¶: There is no explicit class constructor. To make a valid InferRequest instance, use IECore.load_network() method of the IECore class with specified number of requests to get ExecutableNetwork instance which stores infer requests.

Methods

`__delattr__`(name, /)	Implement delattr(self, name).
`__dir__`()	Default dir() implementation.
`__eq__`(value, /)	Return self==value.
`__format__`(format_spec, /)	Default object formatter.
`__ge__`(value, /)	Return self>=value.
`__getattribute__`(name, /)	Return getattr(self, name).
`__gt__`(value, /)	Return self>value.
`__hash__`()	Return hash(self).
`__init__`	There is no explicit class constructor.
`__init_subclass__`	This method is called when a class is subclassed.
`__le__`(value, /)	Return self<=value.
`__lt__`(value, /)	Return self<value.
`__ne__`(value, /)	Return self!=value.
`__new__`(**kwargs)
`__reduce__`	InferRequest.__reduce_cython__(self)
`__reduce_ex__`(protocol, /)	Helper for pickle.
`__repr__`()	Return repr(self).
`__setattr__`(name, value, /)	Implement setattr(self, name, value).
`__setstate__`	InferRequest.__setstate_cython__(self, __pyx_state)
`__sizeof__`()	Size of object in memory, in bytes.
`__str__`()	Return str(self).
`__subclasshook__`	Abstract classes can override this to customize issubclass().
`_fill_inputs`(self, inputs)
`_get_blob_buffer`(self, string blob_name)
`async_infer`(self[, inputs])	Starts asynchronous inference of the infer request and fill outputs array
`get_perf_counts`(self)	Queries performance measures per layer to get feedback of what is the most time consuming layer.
`infer`(self[, inputs])	Starts synchronous inference of the infer request and fill outputs array
`query_state`(self)	Gets state control interface for given infer request State control essential for recurrent networks :return: A vector of Memory State objects
`set_batch`(self, size)	Sets new batch size for certain infer request when dynamic batching is enabled in executable network that created this request.
`set_blob`(self, str blob_name, Blob blob, ...)	Sets user defined Blob for the infer request
`set_completion_callback`(self, py_callback[, ...])	Description: Sets a callback function that is called on success or failure of an asynchronous request
`wait`(self[, timeout])	Waits for the result to become available.

Attributes

`__pyx_vtable__`
`_inputs_list`	_inputs_list: object
`_outputs_list`	_outputs_list: object
`_py_callback`	_py_callback: object
`_py_data`	_py_data: object
`_user_blobs`	_user_blobs: object
`input_blobs`	Dictionary that maps input layer names to corresponding Blobs
`latency`	Current infer request inference time in milliseconds
`output_blobs`	Dictionary that maps output layer names to corresponding Blobs
`preprocess_info`	Dictionary that maps input layer names to corresponding preprocessing information

__class__¶: alias of type

__delattr__(name, /)¶: Implement delattr(self, name).

__dir__()¶: Default dir() implementation.

__eq__(value, /)¶: Return self==value.

__format__(format_spec, /)¶: Default object formatter.

__ge__(value, /)¶: Return self>=value.

__getattribute__(name, /)¶: Return getattr(self, name).

__gt__(value, /)¶: Return self>value.

__hash__()¶: Return hash(self).

__init__()¶: There is no explicit class constructor. To make a valid InferRequest instance, use IECore.load_network() method of the IECore class with specified number of requests to get ExecutableNetwork instance which stores infer requests.

__init_subclass__()¶

This method is called when a class is subclassed.

The default implementation does nothing. It may be overridden to extend subclasses.

__le__(value, /)¶: Return self<=value.

__lt__(value, /)¶: Return self<value.

__ne__(value, /)¶: Return self!=value.

__new__(**kwargs)¶

__pyx_vtable__ = <capsule object NULL>¶

__reduce__()¶: InferRequest.__reduce_cython__(self)

__reduce_ex__(protocol, /)¶: Helper for pickle.

__repr__()¶: Return repr(self).

__setattr__(name, value, /)¶: Implement setattr(self, name, value).

__setstate__()¶: InferRequest.__setstate_cython__(self, __pyx_state)

__sizeof__()¶: Size of object in memory, in bytes.

__str__()¶: Return str(self).

__subclasshook__()¶

Abstract classes can override this to customize issubclass().

This is invoked early on by abc.ABCMeta.__subclasscheck__(). It should return True, False or NotImplemented. If it returns NotImplemented, the normal algorithm is used. Otherwise, it overrides the normal algorithm (and the outcome is cached).

_fill_inputs(self, inputs)¶

_get_blob_buffer(self, string blob_name) → BlobBuffer¶

_inputs_list¶: _inputs_list: object

_outputs_list¶: _outputs_list: object

_py_callback¶: _py_callback: object

_py_data¶: _py_data: object

_user_blobs¶: _user_blobs: object

async_infer(self, inputs=None)¶

Starts asynchronous inference of the infer request and fill outputs array

Parameters: inputs – A dictionary that maps input layer names to numpy.ndarray objects of proper shape with input data for the layer
Returns: None

Usage example:

exec_net = ie_core.load_network(network=net, device_name="CPU", num_requests=2)
exec_net.requests[0].async_infer({input_blob: image})
request_status = exec_net.requests[0].wait()
res = exec_net.requests[0].output_blobs['prob']

get_perf_counts(self)¶

Queries performance measures per layer to get feedback of what is the most time consuming layer.

Note

Performance counters data and format depends on the plugin

Returns: Dictionary containing per-layer execution information.

Usage example:

exec_net = ie_core.load_network(network=net, device_name="CPU", num_requests=2)
exec_net.requests[0].infer({input_blob: image})
exec_net.requests[0].get_perf_counts()
#  {'Conv2D': {'exec_type': 'jit_avx2_1x1',
#              'real_time': 154,
#              'cpu_time': 154,
#              'status': 'EXECUTED',
#              'layer_type': 'Convolution'},
#   'Relu6':  {'exec_type': 'undef',
#              'real_time': 0,
#              'cpu_time': 0,
#              'status': 'NOT_RUN',
#              'layer_type': 'Clamp'}
#   ...
#  }

infer(self, inputs=None)¶

Starts synchronous inference of the infer request and fill outputs array

Parameters: inputs – A dictionary that maps input layer names to numpy.ndarray objects of proper shape with input data for the layer
Returns: None

Usage example:

exec_net = ie_core.load_network(network=net, device_name="CPU", num_requests=2)
exec_net.requests[0].infer({input_blob: image})
res = exec_net.requests[0].output_blobs['prob']
np.flip(np.sort(np.squeeze(res)),0)

# array([4.85416055e-01, 1.70385033e-01, 1.21873841e-01, 1.18894853e-01,
#         5.45198545e-02, 2.44456064e-02, 5.41366823e-03, 3.42589128e-03,
#         2.26027006e-03, 2.12283316e-03 ...])

input_blobs¶: Dictionary that maps input layer names to corresponding Blobs

latency¶: Current infer request inference time in milliseconds

output_blobs¶: Dictionary that maps output layer names to corresponding Blobs

preprocess_info¶: Dictionary that maps input layer names to corresponding preprocessing information

query_state(self)¶: Gets state control interface for given infer request State control essential for recurrent networks :return: A vector of Memory State objects

set_batch(self, size)¶

Sets new batch size for certain infer request when dynamic batching is enabled in executable network that created this request.

Note

Support of dynamic batch size depends on the target plugin.

Parameters: size – New batch size to be used by all the following inference calls for this request
Returns: None

Usage example:

ie = IECore()
net = ie.read_network(model=path_to_xml_file, weights=path_to_bin_file)
# Set max batch size
# net.batch = 10
ie.set_config(config={"DYN_BATCH_ENABLED": "YES"}, device_name=device)
exec_net = ie.load_network(network=net, device_name=device)
# Set batch size for certain network.
# NOTE: Input data shape will not be changed, but will be used partially in inference which increases performance
exec_net.requests[0].set_batch(2)

set_blob(self, str blob_name: str, Blob blob: Blob, PreProcessInfo preprocess_info: PreProcessInfo = None)¶

Sets user defined Blob for the infer request

Parameters

blob_name – A name of input blob
blob – Blob object to set for the infer request
preprocess_info – PreProcessInfo object to set for the infer request.

Returns

None

Usage example:

ie = IECore()
net = IENetwork("./model.xml", "./model.bin")
exec_net = ie.load_network(net, "CPU", num_requests=2)
td = TensorDesc("FP32", (1, 3, 224, 224), "NCHW")
blob_data = np.ones(shape=(1, 3, 224, 224), dtype=np.float32)
blob = Blob(td, blob_data)
exec_net.requests[0].set_blob(blob_name="input_blob_name", blob=blob),

set_completion_callback(self, py_callback, py_data=None)¶

Description: Sets a callback function that is called on success or failure of an asynchronous request

Parameters

py_callback – Any defined or lambda function
py_data – Data that is passed to the callback function

Returns

None

Usage example:

callback = lambda status, py_data: print(f"Request with id {py_data} finished with status {status}")
ie = IECore()
net = ie.read_network(model="./model.xml", weights="./model.bin")
exec_net = ie.load_network(net, "CPU", num_requests=4)
for id, req in enumerate(exec_net.requests):
    req.set_completion_callback(py_callback=callback, py_data=id)

for req in exec_net.requests:
    req.async_infer({"data": img})

wait(self, timeout=None)¶

Waits for the result to become available. Blocks until specified timeout elapses or the result becomes available, whichever comes first.

Parameters: timeout – Time to wait in milliseconds or special (0, -1) cases described above. If not specified, timeout value is set to -1 by default.
Returns: Request status code.

Note

There are special values of the timeout parameter:

0 - Immediately returns the inference status. It does not block or interrupt execution. To find statuses meaning, please refer to enum_InferenceEngine_StatusCode in Inference Engine C++ documentation
-1 - Waits until inference result becomes available (default value)

Usage example: See InferRequest.async_infer() method of the the InferRequest class.