OpenVINO™ Runtime API Tutorial

This tutorial is also available as a Jupyter notebook that can be cloned directly from GitHub. See the installation guide for instructions to run this tutorial locally on Windows, Linux or macOS. To run without installing anything, click the launch binder button.

Binder Github

This notebook explains the basics of the OpenVINO Runtime API. It covers:

The notebook is divided into sections with headers. Each section is standalone and does not depend on previous sections. A segmentation and classification OpenVINO IR model and a segmentation ONNX model are provided as examples. These model files can be replaced with your own models. The exact outputs will be different, but the process is the same.

Loading OpenVINO Runtime and Showing Info

Initialize OpenVINO Runtime with Core()

from openvino.runtime import Core

ie = Core()

OpenVINO Runtime can load a network on a device. A device in this context means a CPU, an Intel GPU, a Neural Compute Stick 2, etc. The available_devices property shows the available devices in your system. The “FULL_DEVICE_NAME” option to ie.get_property() shows the name of the device.

In this notebook, the CPU device is used. To use an integrated GPU, use device_name="GPU" instead. Be aware that loading a network on GPU will be slower than loading a network on CPU, but inference will likely be faster.

devices = ie.available_devices

for device in devices:
    device_name = ie.get_property(device, "FULL_DEVICE_NAME")
    print(f"{device}: {device_name}")
CPU: Intel(R) Core(TM) i9-10920X CPU @ 3.50GHz

Loading a Model

After initializing OpenVINO Runtime, first read the model file with read_model(), then compile it to the specified device with the compile_model() method.

OpenVINO™ supports several model formats and enables developers to convert them to its own OpenVINO IR format using a tool dedicated to this task.

OpenVINO IR Model

An OpenVINO IR (Intermediate Representation) model consists of an .xml file, containing information about network topology, and a .bin file, containing the weights and biases binary data. Models in OpenVINO IR format are obtained by using Model Optimizer tool. The read_model() function expects the .bin weights file to have the same filename and be located in the same directory as the .xml file: model_weights_file == Path(model_xml).with_suffix(".bin"). If this is the case, specifying the weights file is optional. If the weights file has a different filename, it can be specified using the weights parameter in read_model().

The OpenVINO Model Optimizer tool is used to convert models to OpenVINO IR format. Model Optimizer reads the original model and creates an OpenVINO IR model (.xml and .bin files) so inference can be performed without delays due to format conversion. Optionally, Model Optimizer can adjust the model to be more suitable for inference, for example, by alternating input shapes, embedding preprocessing and cutting training parts off. For information on how to convert your existing TensorFlow, PyTorch or ONNX model to OpenVINO IR format with Model Optimizer, refer to the tensorflow-to-openvino and pytorch-onnx-to-openvino notebooks.

from openvino.runtime import Core

ie = Core()
classification_model_xml = "model/classification.xml"

model = ie.read_model(model=classification_model_xml)
compiled_model = ie.compile_model(model=model, device_name="CPU")

ONNX Model

ONNX is an open format built to represent machine learning models. ONNX defines a common set of operators - the building blocks of machine learning and deep learning models - and a common file format to enable AI developers to use models with a variety of frameworks, tools, runtimes, and compilers. OpenVINO supports reading models in ONNX format directly,that means they can be used with OpenVINO Runtime without any prior conversion.

Reading and loading an ONNX model, which is a single .onnx file, works the same way as with an OpenVINO IR model. The model argument points to the filename of an ONNX model.

from openvino.runtime import Core

ie = Core()
onnx_model_path = "model/segmentation.onnx"

model_onnx = ie.read_model(model=onnx_model_path)
compiled_model_onnx = ie.compile_model(model=model_onnx, device_name="CPU")

The ONNX model can be exported to OpenVINO IR with serialize():

from openvino.runtime import serialize

serialize(model_onnx, xml_path="model/exported_onnx_model.xml")

PaddlePaddle Model

PaddlePaddle models saved for inference can also be passed to OpenVINO Runtime without any conversion step. Pass the filename with extension to read_model and exported an OpenVINO IR with serialize

from openvino.runtime import Core

ie = Core()
paddle_model_path = "model/inference.pdmodel"

model_paddle = ie.read_model(model=paddle_model_path)
compiled_model_paddle = ie.compile_model(model=model_paddle, device_name="CPU")
from openvino.runtime import serialize

serialize(model_paddle, xml_path="model/exported_paddle_model.xml")

TensorFlow Model

TensorFlow models saved in frozen graph format can also be passed to read_model starting in OpenVINO 2022.3. > NOTE: Directly loading TensorFlow models is available as a preview feature in the OpenVINO 2022.3 release. Fully functional support will be provided in the upcoming 2023 releases. > Currently support is limited to only frozen graph inference format. Other TensorFlow model formats must be converted to OpenVINO IR using Model Optimizer.

from openvino.runtime import Core

ie = Core()
tf_model_path = "model/classification.pb"

model_tf = ie.read_model(model=tf_model_path)
compiled_model_tf = ie.compile_model(model=model_tf, device_name="CPU")
from openvino.runtime import serialize

serialize(model_tf, xml_path="model/exported_tf_model.xml")

Getting Information about a Model

The OpenVINO Model instance stores information about the model. Information about the inputs and outputs of the model are in model.inputs and model.outputs. These are also properties of the CompiledModel instance. While using model.inputs and model.outputs in the cells below, you can also use compiled_model.inputs and compiled_model.outputs.

Model Inputs

Information about all input layers is stored in the inputs dictionary.

from openvino.runtime import Core

ie = Core()
classification_model_xml = "model/classification.xml"
model = ie.read_model(model=classification_model_xml)
model.inputs
[<Output: names[input, input:0] shape[1,3,224,224] type: f32>]

The cell above shows that the loaded model expects one input with the name input. If you loaded a different model, you may see a different input layer name, and you may see more inputs. You may also obtain info about each input layer using model.input(index), where index is a numeric index of the input layers in the model. If a model has only one input, index can be omitted.

input_layer = model.input(0)

It is often useful to have a reference to the name of the first input layer. For a model with one input, model.input(0).any_name gets this name.

input_layer.any_name
'input'

The next cell prints the input layout, precision and shape.

print(f"input precision: {input_layer.element_type}")
print(f"input shape: {input_layer.shape}")
input precision: <Type: 'float32'>
input shape: [1,3,224,224]

This cell shows that the model expects inputs with a shape of [1,3,224,224], and that this is in the NCHW layout. This means that the model expects input data with the batch size of 1 (N), 3 channels (C) , and images with a height (H) and width (W) equal to 224. The input data is expected to be of FP32 (floating point) precision.

Model Outputs

from openvino.runtime import Core

ie = Core()
classification_model_xml = "model/classification.xml"
model = ie.read_model(model=classification_model_xml)
model.outputs
[<Output: names[MobilenetV3/Predictions/Softmax] shape[1,1001] type: f32>]

Model output info is stored in model.outputs. The cell above shows that the model returns one output, with the MobilenetV3/Predictions/Softmax name. Loading a different model will result in different output layer name, and more outputs might be returned. Similar to input, you may also obtain information about each output separately using model.output(index)

Since this model has one output, follow the same method as for the input layer to get its name.

output_layer = model.output(0)
output_layer.any_name
'MobilenetV3/Predictions/Softmax'

Getting the output precision and shape is similar to getting the input precision and shape.

print(f"output precision: {output_layer.element_type}")
print(f"output shape: {output_layer.shape}")
output precision: <Type: 'float32'>
output shape: [1,1001]

This cell shows that the model returns outputs with a shape of [1, 1001], where 1 is the batch size (N) and 1001 is the number of classes (C). The output is returned as 32-bit floating point.

Doing Inference on a Model

NOTE this notebook demonstrates only the basic synchronous inference API. For an async inference example, please refer to Async API notebook

The diagram below shows a typical inference pipeline with OpenVINO

image.png

image.png

Creating OpenVINO Core and model compilation is covered in the previous steps. The next step is preparing an inference request. To do inference on a model, first create an inference request by calling the create_infer_request() method of CompiledModel, compiled_model that was loaded with compile_model(). Then, call the infer() method of InferRequest. It expects one argument: inputs. This is a dictionary that maps input layer names to input data or list of input data in np.ndarray format, where the position of the input tensor corresponds to input index. If a model has a single input, wrapping to a dictionary or list can be omitted.

Load the network

from openvino.runtime import Core

ie = Core()
classification_model_xml = "model/classification.xml"
model = ie.read_model(model=classification_model_xml)
compiled_model = ie.compile_model(model=model, device_name="CPU")
input_layer = compiled_model.input(0)
output_layer = compiled_model.output(0)

Load an image and convert to the input shape

To propagate an image through the network, it needs to be loaded into an array, resized to the shape that the network expects, and converted to the input layout of the network.

import cv2

image_filename = "../data/image/coco_hollywood.jpg"
image = cv2.imread(image_filename)
image.shape
(663, 994, 3)

The image has a shape of (663,994,3). It is 663 pixels in height, 994 pixels in width, and has 3 color channels. A reference to the height and width expected by the network is obtained and the image is resized to these dimensions.

# N,C,H,W = batch size, number of channels, height, width.
N, C, H, W = input_layer.shape
# OpenCV resize expects the destination size as (width, height).
resized_image = cv2.resize(src=image, dsize=(W, H))
resized_image.shape
(224, 224, 3)

Now, the image has the width and height that the network expects. This is still in HWC format and must be changed to NCHW format. First, call the np.transpose() method to change to CHW and then add the N dimension (where N= 1) by calling the np.expand_dims() method. Next, convert the data to FP32 with np.astype() method.

import numpy as np

input_data = np.expand_dims(np.transpose(resized_image, (2, 0, 1)), 0).astype(np.float32)
input_data.shape
(1, 3, 224, 224)

Do inference

Now that the input data is in the right shape, run inference. The CompiledModel inference result is a dictionary where keys are the Output class instances (the same keys in compiled_model.outputs that can also be obtained with compiled_model.output(index)) and values - predicted result in np.array format.

# for single input models only
result = compiled_model(input_data)[output_layer]

# for multiple inputs in a list
result = compiled_model([input_data])[output_layer]

# or using a dictionary, where the key is input tensor name or index
result = compiled_model({input_layer.any_name: input_data})[output_layer]

You can also create InferRequest and run infer method on request.

request = compiled_model.create_infer_request()
request.infer(inputs={input_layer.any_name: input_data})
result = request.get_output_tensor(output_layer.index).data

The .infer() function sets output tensor, that can be reached, using get_output_tensor(). Since this network returns one output, and the reference to the output layer is in the output_layer.index parameter, you can get the data with request.get_output_tensor(output_layer.index). To get a numpy array from the output, use the .data parameter.

result.shape
(1, 1001)

The output shape is (1,1001), which is the expected output shape. This shape indicates that the network returns probabilities for 1001 classes. To learn more about this notion, refer to the hello world notebook.

Reshaping and Resizing

Change Image Size

Instead of reshaping the image to fit the model, it is also possible to reshape the model to fit the image. Be aware that not all models support reshaping, and models that do, may not support all input shapes. The model accuracy may also suffer if you reshape the model input shape.

First check the input shape of the model, then reshape it to the new input shape.

from openvino.runtime import Core, PartialShape

ie = Core()
segmentation_model_xml = "model/segmentation.xml"
segmentation_model = ie.read_model(model=segmentation_model_xml)
segmentation_input_layer = segmentation_model.input(0)
segmentation_output_layer = segmentation_model.output(0)

print("~~~~ ORIGINAL MODEL ~~~~")
print(f"input shape: {segmentation_input_layer.shape}")
print(f"output shape: {segmentation_output_layer.shape}")

new_shape = PartialShape([1, 3, 544, 544])
segmentation_model.reshape({segmentation_input_layer.any_name: new_shape})
segmentation_compiled_model = ie.compile_model(model=segmentation_model, device_name="CPU")
# help(segmentation_compiled_model)
print("~~~~ RESHAPED MODEL ~~~~")
print(f"model input shape: {segmentation_input_layer.shape}")
print(
    f"compiled_model input shape: "
    f"{segmentation_compiled_model.input(index=0).shape}"
)
print(f"compiled_model output shape: {segmentation_output_layer.shape}")
~~~~ ORIGINAL MODEL ~~~~
input shape: [1,3,512,512]
output shape: [1,1,512,512]
~~~~ RESHAPED MODEL ~~~~
model input shape: [1,3,544,544]
compiled_model input shape: [1,3,544,544]
compiled_model output shape: [1,1,544,544]

The input shape for the segmentation network is [1,3,512,512], with the NCHW layout: the network expects 3-channel images with a width and height of 512 and a batch size of 1. Reshape the network with the .reshape() method of IENetwork to make it accept input images with a width and height of 544. This segmentation network always returns arrays with the input width and height of equal value. Therefore, setting the input dimensions to 544x544 also modifies the output dimensions. After reshaping, compile the network once again.

Change Batch Size

Use the .reshape() method to set the batch size, by increasing the first element of new_shape. For example, to set a batch size of two, set new_shape = (2,3,544,544) in the cell above.

from openvino.runtime import Core, PartialShape

ie = Core()
segmentation_model_xml = "model/segmentation.xml"
segmentation_model = ie.read_model(model=segmentation_model_xml)
segmentation_input_layer = segmentation_model.input(0)
segmentation_output_layer = segmentation_model.output(0)
new_shape = PartialShape([2, 3, 544, 544])
segmentation_model.reshape({segmentation_input_layer.any_name: new_shape})
segmentation_compiled_model = ie.compile_model(model=segmentation_model, device_name="CPU")

print(f"input shape: {segmentation_input_layer.shape}")
print(f"output shape: {segmentation_output_layer.shape}")
input shape: [2,3,544,544]
output shape: [2,1,544,544]

The output shows that by setting the batch size to 2, the first element (N) of the input and output shape has a value of 2. Propagate the input image through the network to see the result:

import numpy as np
from openvino.runtime import Core, PartialShape

ie = Core()
segmentation_model_xml = "model/segmentation.xml"
segmentation_model = ie.read_model(model=segmentation_model_xml)
segmentation_input_layer = segmentation_model.input(0)
segmentation_output_layer = segmentation_model.output(0)
new_shape = PartialShape([2, 3, 544, 544])
segmentation_model.reshape({segmentation_input_layer.any_name: new_shape})
segmentation_compiled_model = ie.compile_model(model=segmentation_model, device_name="CPU")
input_data = np.random.rand(2, 3, 544, 544)

output = segmentation_compiled_model([input_data])

print(f"input data shape: {input_data.shape}")
print(f"result data data shape: {segmentation_output_layer.shape}")
input data shape: (2, 3, 544, 544)
result data data shape: [2,1,544,544]

Caching a Model

For some devices, like GPU, loading a model can take some time. Model Caching solves this issue by caching the model in a cache directory. If ie.compile_model(model=net, device_name=device_name, config=config_dict) is set, caching will be used. This option checks if a model exists in the cache. If so, it loads it from the cache. If not, it loads the model regularly, and stores it in the cache, so that the next time the model is loaded when this option is set, the model will be loaded from the cache.

In the cell below, we create a model_cache directory as a subdirectory of model, where the model will be cached for the specified device. The model will be loaded to the GPU. After running this cell once, the model will be cached, so subsequent runs of this cell will load the model from the cache.

Note: Model Caching is also available on CPU devices

import time
from pathlib import Path

from openvino.runtime import Core

ie = Core()

device_name = "GPU"

if device_name in ie.available_devices:
    cache_path = Path("model/model_cache")
    cache_path.mkdir(exist_ok=True)
    # Enable caching for OpenVINO Runtime. To disable caching set enable_caching = False
    enable_caching = True
    config_dict = {"CACHE_DIR": str(cache_path)} if enable_caching else {}

    classification_model_xml = "model/classification.xml"
    model = ie.read_model(model=classification_model_xml)

    start_time = time.perf_counter()
    compiled_model = ie.compile_model(model=model, device_name=device_name, config=config_dict)
    end_time = time.perf_counter()
    print(f"Loading the network to the {device_name} device took {end_time-start_time:.2f} seconds.")

After running the previous cell, we know the model exists in the cache directory. Then, we delete the compiled model and load it again. Now, we measure the time it takes now.

if device_name in ie.available_devices:
    del compiled_model
    start_time = time.perf_counter()
    compiled_model = ie.compile_model(model=model, device_name=device_name, config=config_dict)
    end_time = time.perf_counter()
    print(f"Loading the network to the {device_name} device took {end_time-start_time:.2f} seconds.")