OpenVINO™ Python API exclusives¶
OpenVINO™ Runtime Python API is exposing additional features and helpers to elevate user experience. Main goal of Python API is to provide user-friendly and simple, still powerful, tool for Python users.
Easier model compilation¶
CompiledModel
can be easily created with the helper method. It hides Core
creation and applies AUTO
device by default.
import openvino.runtime as ov
compiled_model = ov.compile_model("model.xml")
Model/CompiledModel inputs and outputs¶
Besides functions aligned to C++ API, some of them have their Pythonic counterparts or extensions. For example, Model
and CompiledModel
inputs/outputs can be accessed via properties.
Refer to Python API documentation on which helper functions or properties are available for different classes.
Working with Tensor¶
Python API allows passing data as tensors. Tensor
object holds a copy of the data from the given array. dtype
of numpy arrays is converted to OpenVINO™ types automatically.
Slices of array’s memory¶
One of the Tensor
class constructors allows to share the slice of array’s memory. When shape
is specified in the constructor that has the numpy array as first argument, it triggers the special shared memory mode.
Running inference¶
Python API supports extra calling methods to synchronous and asynchronous modes for inference.
All infer methods allow users to pass data as popular numpy arrays, gathered in either Python dicts or lists.
# Passing inputs data in form of a dictionary
infer_request.infer(inputs={0: data})
# Passing inputs data in form of a list
infer_request.infer(inputs=[data])
Results from inference can be obtained in various ways:
# Get output tensor
results = infer_request.get_output_tensor().data
# Get tensor with CompiledModel's output node
results = infer_request.get_tensor(compiled.outputs[0]).data
# Get all results with special helper property
results = list(infer_request.results.values())
Synchronous mode - extended¶
Python API provides different synchronous calls to infer model, which block the application execution. Additionally these calls return results of inference:
# Simple call to InferRequest
results = infer_request.infer(inputs={0: data})
# Extra feature: calling CompiledModel directly
results = compiled_model(inputs={0: data})
AsyncInferQueue¶
Asynchronous mode pipelines can be supported with wrapper class called AsyncInferQueue
. This class automatically spawns pool of InferRequest
objects (also called “jobs”) and provides synchronization mechanisms to control flow of the pipeline.
Each job is distinguishable by unique id
, which is in the range from 0 up to number of jobs specified in AsyncInferQueue
constructor.
Function call start_async
is not required to be synchronized, it waits for any available job if queue is busy/overloaded. Every AsyncInferQueue
code block should end with wait_all
function. It provides “global” synchronization of all jobs in the pool and ensure that access to them is safe.
core = ov.Core()
# Simple model that adds two inputs together
input_a = ov.opset8.parameter([8])
input_b = ov.opset8.parameter([8])
res = ov.opset8.add(input_a, input_b)
model = ov.Model(res, [input_a, input_b])
compiled = core.compile_model(model, "CPU")
# Number of InferRequests that AsyncInferQueue holds
jobs = 4
infer_queue = ov.AsyncInferQueue(compiled, jobs)
# Create data
data = [np.array([i] \* 8, dtype=np.float32) for i in range(jobs)]
# Run all jobs
for i in range(len(data)):
infer_queue.start_async({0: data[i], 1: data[i]})
infer_queue.wait_all()
Acquire results from requests¶
After the call to wait_all
, jobs and their data can be safely accessed. Acquring of a specific job with [id]
returns InferRequest
object, which results in seamless retrieval of the output data.
results = infer_queue[3].get_output_tensor().data
Setting callbacks¶
Another feature of AsyncInferQueue
is ability of setting callbacks. When callback is set, any job that ends inference, calls upon Python function. Callback function must have two arguments. First is the request that calls the callback, it provides InferRequest
API. Second one being called “userdata”, provides possibility of passing runtime values, which can be of any Python type and later used inside callback function.
The callback of AsyncInferQueue
is uniform for every job. When executed, GIL is acquired to ensure safety of data manipulation inside the function.
data_done = [False for _ in range(jobs)]
def f(request, userdata):
print(f"Done! Result: {request.get_output_tensor().data}")
data_done[userdata] = True
infer_queue.set_callback(f)
for i in range(len(data)):
infer_queue.start_async({0: data[i], 1: data[i]}, userdata=i)
infer_queue.wait_all()
assert all(data_done)