Asynchronous Inference with OpenVINO™¶
This Jupyter notebook can be launched on-line, opening an interactive environment in a browser window. You can also make a local installation. Choose one of the following options:
This notebook demonstrates how to use the Async API for asynchronous execution with OpenVINO.
OpenVINO Runtime supports inference in either synchronous or asynchronous mode. The key advantage of the Async API is that when a device is busy with inference, the application can perform other tasks in parallel (for example, populating inputs or scheduling other requests) rather than wait for the current inference to complete first.
Table of contents:¶
Imports¶
import platform
%pip install -q "openvino>=2023.1.0"
%pip install -q opencv-python
if platform.system() != "windows":
%pip install -q "matplotlib>=3.4"
else:
%pip install -q "matplotlib>=3.4,<3.7"
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
import cv2
import time
import numpy as np
import openvino as ov
from IPython import display
import matplotlib.pyplot as plt
# Fetch the notebook utils script from the openvino_notebooks repo
import urllib.request
urllib.request.urlretrieve(
url='https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/main/notebooks/utils/notebook_utils.py',
filename='notebook_utils.py'
)
import notebook_utils as utils
Prepare model and data processing¶
Download test model¶
We use a pre-trained model from OpenVINO’s Open Model Zoo to start the test. In this case, the model will be executed to detect the person in each frame of the video.
# directory where model will be downloaded
base_model_dir = "model"
# model name as named in Open Model Zoo
model_name = "person-detection-0202"
precision = "FP16"
model_path = (
f"model/intel/{model_name}/{precision}/{model_name}.xml"
)
download_command = f"omz_downloader " \
f"--name {model_name} " \
f"--precision {precision} " \
f"--output_dir {base_model_dir} " \
f"--cache_dir {base_model_dir}"
! $download_command
################|| Downloading person-detection-0202 ||################
========== Downloading model/intel/person-detection-0202/FP16/person-detection-0202.xml
... 12%, 32 KB, 847 KB/s, 0 seconds passed
... 25%, 64 KB, 918 KB/s, 0 seconds passed
… 38%, 96 KB, 1348 KB/s, 0 seconds passed
... 51%, 128 KB, 1218 KB/s, 0 seconds passed
… 64%, 160 KB, 1510 KB/s, 0 seconds passed … 77%, 192 KB, 1779 KB/s, 0 seconds passed … 89%, 224 KB, 2058 KB/s, 0 seconds passed … 100%, 248 KB, 2262 KB/s, 0 seconds passed
========== Downloading model/intel/person-detection-0202/FP16/person-detection-0202.bin
... 0%, 32 KB, 946 KB/s, 0 seconds passed
… 1%, 64 KB, 898 KB/s, 0 seconds passed … 2%, 96 KB, 1325 KB/s, 0 seconds passed
... 3%, 128 KB, 1272 KB/s, 0 seconds passed
… 4%, 160 KB, 1575 KB/s, 0 seconds passed … 5%, 192 KB, 1870 KB/s, 0 seconds passed … 6%, 224 KB, 2155 KB/s, 0 seconds passed … 7%, 256 KB, 2432 KB/s, 0 seconds passed
... 8%, 288 KB, 2135 KB/s, 0 seconds passed
… 9%, 320 KB, 2352 KB/s, 0 seconds passed … 9%, 352 KB, 2561 KB/s, 0 seconds passed … 10%, 384 KB, 2770 KB/s, 0 seconds passed … 11%, 416 KB, 2963 KB/s, 0 seconds passed … 12%, 448 KB, 3064 KB/s, 0 seconds passed … 13%, 480 KB, 3260 KB/s, 0 seconds passed … 14%, 512 KB, 3432 KB/s, 0 seconds passed … 15%, 544 KB, 3629 KB/s, 0 seconds passed … 16%, 576 KB, 3420 KB/s, 0 seconds passed … 17%, 608 KB, 3600 KB/s, 0 seconds passed … 18%, 640 KB, 3782 KB/s, 0 seconds passed … 18%, 672 KB, 3964 KB/s, 0 seconds passed … 19%, 704 KB, 4145 KB/s, 0 seconds passed … 20%, 736 KB, 4326 KB/s, 0 seconds passed … 21%, 768 KB, 4506 KB/s, 0 seconds passed … 22%, 800 KB, 4685 KB/s, 0 seconds passed … 23%, 832 KB, 4865 KB/s, 0 seconds passed … 24%, 864 KB, 5043 KB/s, 0 seconds passed … 25%, 896 KB, 5107 KB/s, 0 seconds passed … 26%, 928 KB, 5275 KB/s, 0 seconds passed … 27%, 960 KB, 5448 KB/s, 0 seconds passed
... 27%, 992 KB, 5620 KB/s, 0 seconds passed
… 28%, 1024 KB, 5790 KB/s, 0 seconds passed … 29%, 1056 KB, 5961 KB/s, 0 seconds passed … 30%, 1088 KB, 6131 KB/s, 0 seconds passed … 31%, 1120 KB, 6252 KB/s, 0 seconds passed … 32%, 1152 KB, 6381 KB/s, 0 seconds passed … 33%, 1184 KB, 5868 KB/s, 0 seconds passed … 34%, 1216 KB, 6013 KB/s, 0 seconds passed … 35%, 1248 KB, 6111 KB/s, 0 seconds passed … 36%, 1280 KB, 6254 KB/s, 0 seconds passed … 36%, 1312 KB, 6399 KB/s, 0 seconds passed … 37%, 1344 KB, 6545 KB/s, 0 seconds passed … 38%, 1376 KB, 6690 KB/s, 0 seconds passed … 39%, 1408 KB, 6836 KB/s, 0 seconds passed … 40%, 1440 KB, 6981 KB/s, 0 seconds passed … 41%, 1472 KB, 7125 KB/s, 0 seconds passed … 42%, 1504 KB, 7268 KB/s, 0 seconds passed … 43%, 1536 KB, 7412 KB/s, 0 seconds passed … 44%, 1568 KB, 7556 KB/s, 0 seconds passed … 45%, 1600 KB, 7699 KB/s, 0 seconds passed … 45%, 1632 KB, 7841 KB/s, 0 seconds passed … 46%, 1664 KB, 7927 KB/s, 0 seconds passed … 47%, 1696 KB, 8035 KB/s, 0 seconds passed … 48%, 1728 KB, 8106 KB/s, 0 seconds passed … 49%, 1760 KB, 8210 KB/s, 0 seconds passed … 50%, 1792 KB, 8335 KB/s, 0 seconds passed … 51%, 1824 KB, 8471 KB/s, 0 seconds passed … 52%, 1856 KB, 8591 KB/s, 0 seconds passed … 53%, 1888 KB, 8698 KB/s, 0 seconds passed … 54%, 1920 KB, 8832 KB/s, 0 seconds passed … 54%, 1952 KB, 8953 KB/s, 0 seconds passed … 55%, 1984 KB, 9085 KB/s, 0 seconds passed … 56%, 2016 KB, 9198 KB/s, 0 seconds passed … 57%, 2048 KB, 9330 KB/s, 0 seconds passed … 58%, 2080 KB, 9443 KB/s, 0 seconds passed … 59%, 2112 KB, 9575 KB/s, 0 seconds passed … 60%, 2144 KB, 9698 KB/s, 0 seconds passed … 61%, 2176 KB, 9829 KB/s, 0 seconds passed … 62%, 2208 KB, 9940 KB/s, 0 seconds passed … 63%, 2240 KB, 10069 KB/s, 0 seconds passed … 64%, 2272 KB, 10179 KB/s, 0 seconds passed … 64%, 2304 KB, 10308 KB/s, 0 seconds passed … 65%, 2336 KB, 10395 KB/s, 0 seconds passed
... 66%, 2368 KB, 10012 KB/s, 0 seconds passed
… 67%, 2400 KB, 10057 KB/s, 0 seconds passed … 68%, 2432 KB, 10171 KB/s, 0 seconds passed … 69%, 2464 KB, 10276 KB/s, 0 seconds passed … 70%, 2496 KB, 10393 KB/s, 0 seconds passed … 71%, 2528 KB, 10511 KB/s, 0 seconds passed … 72%, 2560 KB, 10594 KB/s, 0 seconds passed … 73%, 2592 KB, 10707 KB/s, 0 seconds passed … 73%, 2624 KB, 10823 KB/s, 0 seconds passed … 74%, 2656 KB, 10940 KB/s, 0 seconds passed … 75%, 2688 KB, 11057 KB/s, 0 seconds passed … 76%, 2720 KB, 11175 KB/s, 0 seconds passed … 77%, 2752 KB, 11293 KB/s, 0 seconds passed … 78%, 2784 KB, 11410 KB/s, 0 seconds passed … 79%, 2816 KB, 11527 KB/s, 0 seconds passed … 80%, 2848 KB, 11643 KB/s, 0 seconds passed … 81%, 2880 KB, 11760 KB/s, 0 seconds passed … 82%, 2912 KB, 11875 KB/s, 0 seconds passed … 82%, 2944 KB, 11990 KB/s, 0 seconds passed … 83%, 2976 KB, 12106 KB/s, 0 seconds passed … 84%, 3008 KB, 12221 KB/s, 0 seconds passed … 85%, 3040 KB, 12335 KB/s, 0 seconds passed … 86%, 3072 KB, 12450 KB/s, 0 seconds passed … 87%, 3104 KB, 12564 KB/s, 0 seconds passed … 88%, 3136 KB, 12677 KB/s, 0 seconds passed … 89%, 3168 KB, 12791 KB/s, 0 seconds passed … 90%, 3200 KB, 12904 KB/s, 0 seconds passed … 91%, 3232 KB, 13017 KB/s, 0 seconds passed … 91%, 3264 KB, 13131 KB/s, 0 seconds passed … 92%, 3296 KB, 13243 KB/s, 0 seconds passed … 93%, 3328 KB, 13356 KB/s, 0 seconds passed … 94%, 3360 KB, 13468 KB/s, 0 seconds passed … 95%, 3392 KB, 13577 KB/s, 0 seconds passed … 96%, 3424 KB, 13688 KB/s, 0 seconds passed … 97%, 3456 KB, 13799 KB/s, 0 seconds passed … 98%, 3488 KB, 13911 KB/s, 0 seconds passed … 99%, 3520 KB, 14023 KB/s, 0 seconds passed … 100%, 3549 KB, 14120 KB/s, 0 seconds passed
Select inference device¶
import ipywidgets as widgets
core = ov.Core()
device = widgets.Dropdown(
options=core.available_devices + ["AUTO"],
value='CPU',
description='Device:',
disabled=False,
)
device
Dropdown(description='Device:', options=('CPU', 'AUTO'), value='CPU')
Load the model¶
# initialize OpenVINO runtime
core = ov.Core()
# read the network and corresponding weights from file
model = core.read_model(model=model_path)
# compile the model for the CPU (you can choose manually CPU, GPU etc.)
# or let the engine choose the best available device (AUTO)
compiled_model = core.compile_model(model=model, device_name=device.value)
# get input node
input_layer_ir = model.input(0)
N, C, H, W = input_layer_ir.shape
shape = (H, W)
Create functions for data processing¶
def preprocess(image):
"""
Define the preprocess function for input data
:param: image: the orignal input frame
:returns:
resized_image: the image processed
"""
resized_image = cv2.resize(image, shape)
resized_image = cv2.cvtColor(np.array(resized_image), cv2.COLOR_BGR2RGB)
resized_image = resized_image.transpose((2, 0, 1))
resized_image = np.expand_dims(resized_image, axis=0).astype(np.float32)
return resized_image
def postprocess(result, image, fps):
"""
Define the postprocess function for output data
:param: result: the inference results
image: the orignal input frame
fps: average throughput calculated for each frame
:returns:
image: the image with bounding box and fps message
"""
detections = result.reshape(-1, 7)
for i, detection in enumerate(detections):
_, image_id, confidence, xmin, ymin, xmax, ymax = detection
if confidence > 0.5:
xmin = int(max((xmin * image.shape[1]), 10))
ymin = int(max((ymin * image.shape[0]), 10))
xmax = int(min((xmax * image.shape[1]), image.shape[1] - 10))
ymax = int(min((ymax * image.shape[0]), image.shape[0] - 10))
cv2.rectangle(image, (xmin, ymin), (xmax, ymax), (0, 255, 0), 2)
cv2.putText(image, str(round(fps, 2)) + " fps", (5, 20), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 255, 0), 3)
return image
Get the test video¶
video_path = 'https://storage.openvinotoolkit.org/repositories/openvino_notebooks/data/data/video/CEO%20Pat%20Gelsinger%20on%20Leading%20Intel.mp4'
How to improve the throughput of video processing¶
Below, we compare the performance of the synchronous and async-based approaches:
Sync Mode (default)¶
Let us see how video processing works with the default approach. Using the synchronous approach, the frame is captured with OpenCV and then immediately processed:
while(true) {
// capture frame
// populate CURRENT InferRequest
// Infer CURRENT InferRequest
//this call is synchronous
// display CURRENT result
}
```
def sync_api(source, flip, fps, use_popup, skip_first_frames):
"""
Define the main function for video processing in sync mode
:param: source: the video path or the ID of your webcam
:returns:
sync_fps: the inference throughput in sync mode
"""
frame_number = 0
infer_request = compiled_model.create_infer_request()
player = None
try:
# Create a video player
player = utils.VideoPlayer(source, flip=flip, fps=fps, skip_first_frames=skip_first_frames)
# Start capturing
start_time = time.time()
player.start()
if use_popup:
title = "Press ESC to Exit"
cv2.namedWindow(title, cv2.WINDOW_GUI_NORMAL | cv2.WINDOW_AUTOSIZE)
while True:
frame = player.next()
if frame is None:
print("Source ended")
break
resized_frame = preprocess(frame)
infer_request.set_tensor(input_layer_ir, ov.Tensor(resized_frame))
# Start the inference request in synchronous mode
infer_request.infer()
res = infer_request.get_output_tensor(0).data
stop_time = time.time()
total_time = stop_time - start_time
frame_number = frame_number + 1
sync_fps = frame_number / total_time
frame = postprocess(res, frame, sync_fps)
# Display the results
if use_popup:
cv2.imshow(title, frame)
key = cv2.waitKey(1)
# escape = 27
if key == 27:
break
else:
# Encode numpy array to jpg
_, encoded_img = cv2.imencode(".jpg", frame, params=[cv2.IMWRITE_JPEG_QUALITY, 90])
# Create IPython image
i = display.Image(data=encoded_img)
# Display the image in this notebook
display.clear_output(wait=True)
display.display(i)
# ctrl-c
except KeyboardInterrupt:
print("Interrupted")
# Any different error
except RuntimeError as e:
print(e)
finally:
if use_popup:
cv2.destroyAllWindows()
if player is not None:
# stop capturing
player.stop()
return sync_fps
Test performance in Sync Mode¶
sync_fps = sync_api(source=video_path, flip=False, fps=30, use_popup=False, skip_first_frames=800)
print(f"average throuput in sync mode: {sync_fps:.2f} fps")
Source ended
average throuput in sync mode: 43.30 fps
Async Mode¶
Let us see how the OpenVINO Async API can improve the overall frame rate of an application. The key advantage of the Async approach is as follows: while a device is busy with the inference, the application can do other things in parallel (for example, populating inputs or scheduling other requests) rather than wait for the current inference to complete first.
In the example below, inference is applied to the results of the video decoding. So it is possible to keep multiple infer requests, and while the current request is processed, the input frame for the next is being captured. This essentially hides the latency of capturing, so that the overall frame rate is rather determined only by the slowest part of the pipeline (decoding vs inference) and not by the sum of the stages.
while(true) {
// capture frame
// populate NEXT InferRequest
// start NEXT InferRequest
// this call is async and returns immediately
// wait for the CURRENT InferRequest
// display CURRENT result
// swap CURRENT and NEXT InferRequests
}
def async_api(source, flip, fps, use_popup, skip_first_frames):
"""
Define the main function for video processing in async mode
:param: source: the video path or the ID of your webcam
:returns:
async_fps: the inference throughput in async mode
"""
frame_number = 0
# Create 2 infer requests
curr_request = compiled_model.create_infer_request()
next_request = compiled_model.create_infer_request()
player = None
async_fps = 0
try:
# Create a video player
player = utils.VideoPlayer(source, flip=flip, fps=fps, skip_first_frames=skip_first_frames)
# Start capturing
start_time = time.time()
player.start()
if use_popup:
title = "Press ESC to Exit"
cv2.namedWindow(title, cv2.WINDOW_GUI_NORMAL | cv2.WINDOW_AUTOSIZE)
# Capture CURRENT frame
frame = player.next()
resized_frame = preprocess(frame)
curr_request.set_tensor(input_layer_ir, ov.Tensor(resized_frame))
# Start the CURRENT inference request
curr_request.start_async()
while True:
# Capture NEXT frame
next_frame = player.next()
if next_frame is None:
print("Source ended")
break
resized_frame = preprocess(next_frame)
next_request.set_tensor(input_layer_ir, ov.Tensor(resized_frame))
# Start the NEXT inference request
next_request.start_async()
# Waiting for CURRENT inference result
curr_request.wait()
res = curr_request.get_output_tensor(0).data
stop_time = time.time()
total_time = stop_time - start_time
frame_number = frame_number + 1
async_fps = frame_number / total_time
frame = postprocess(res, frame, async_fps)
# Display the results
if use_popup:
cv2.imshow(title, frame)
key = cv2.waitKey(1)
# escape = 27
if key == 27:
break
else:
# Encode numpy array to jpg
_, encoded_img = cv2.imencode(".jpg", frame, params=[cv2.IMWRITE_JPEG_QUALITY, 90])
# Create IPython image
i = display.Image(data=encoded_img)
# Display the image in this notebook
display.clear_output(wait=True)
display.display(i)
# Swap CURRENT and NEXT frames
frame = next_frame
# Swap CURRENT and NEXT infer requests
curr_request, next_request = next_request, curr_request
# ctrl-c
except KeyboardInterrupt:
print("Interrupted")
# Any different error
except RuntimeError as e:
print(e)
finally:
if use_popup:
cv2.destroyAllWindows()
if player is not None:
# stop capturing
player.stop()
return async_fps
Test the performance in Async Mode¶
async_fps = async_api(source=video_path, flip=False, fps=30, use_popup=False, skip_first_frames=800)
print(f"average throuput in async mode: {async_fps:.2f} fps")
Source ended
average throuput in async mode: 73.14 fps
Compare the performance¶
width = 0.4
fontsize = 14
plt.rc('font', size=fontsize)
fig, ax = plt.subplots(1, 1, figsize=(10, 8))
rects1 = ax.bar([0], sync_fps, width, color='#557f2d')
rects2 = ax.bar([width], async_fps, width)
ax.set_ylabel("frames per second")
ax.set_xticks([0, width])
ax.set_xticklabels(["Sync mode", "Async mode"])
ax.set_xlabel("Higher is better")
fig.suptitle('Sync mode VS Async mode')
fig.tight_layout()
plt.show()
AsyncInferQueue
¶
Asynchronous mode pipelines can be supported with the
AsyncInferQueue
wrapper class. This class automatically spawns the pool of
InferRequest
objects (also called “jobs”) and provides
synchronization mechanisms to control the flow of the pipeline. It is a
simpler way to manage the infer request queue in Asynchronous mode.
Setting Callback¶
When callback
is set, any job that ends inference calls upon the
Python function. The callback
function must have two arguments: one
is the request that calls the callback
, which provides the
InferRequest
API; the other is called “user data”, which provides
the possibility of passing runtime values.
def callback(infer_request, info) -> None:
"""
Define the callback function for postprocessing
:param: infer_request: the infer_request object
info: a tuple includes original frame and starts time
:returns:
None
"""
global frame_number
global total_time
global inferqueue_fps
stop_time = time.time()
frame, start_time = info
total_time = stop_time - start_time
frame_number = frame_number + 1
inferqueue_fps = frame_number / total_time
res = infer_request.get_output_tensor(0).data[0]
frame = postprocess(res, frame, inferqueue_fps)
# Encode numpy array to jpg
_, encoded_img = cv2.imencode(".jpg", frame, params=[cv2.IMWRITE_JPEG_QUALITY, 90])
# Create IPython image
i = display.Image(data=encoded_img)
# Display the image in this notebook
display.clear_output(wait=True)
display.display(i)
def inferqueue(source, flip, fps, skip_first_frames) -> None:
"""
Define the main function for video processing with async infer queue
:param: source: the video path or the ID of your webcam
:retuns:
None
"""
# Create infer requests queue
infer_queue = ov.AsyncInferQueue(compiled_model, 2)
infer_queue.set_callback(callback)
player = None
try:
# Create a video player
player = utils.VideoPlayer(source, flip=flip, fps=fps, skip_first_frames=skip_first_frames)
# Start capturing
start_time = time.time()
player.start()
while True:
# Capture frame
frame = player.next()
if frame is None:
print("Source ended")
break
resized_frame = preprocess(frame)
# Start the inference request with async infer queue
infer_queue.start_async({input_layer_ir.any_name: resized_frame}, (frame, start_time))
except KeyboardInterrupt:
print("Interrupted")
# Any different error
except RuntimeError as e:
print(e)
finally:
infer_queue.wait_all()
player.stop()
Test the performance with AsyncInferQueue
¶
frame_number = 0
total_time = 0
inferqueue(source=video_path, flip=False, fps=30, skip_first_frames=800)
print(f"average throughput in async mode with async infer queue: {inferqueue_fps:.2f} fps")
average throughput in async mode with async infer queue: 112.94 fps