PaddlePaddle Image Classification with OpenVINO™

This tutorial is also available as a Jupyter notebook that can be cloned directly from GitHub. See the installation guide for instructions to run this tutorial locally on Windows, Linux or macOS. To run without installing anything, click the launch binder button.

Binder Github

This demo shows how to run a MobileNetV3 Large PaddePaddle model, using OpenVINO Runtime. Instead of exporting the PaddlePaddle model to ONNX and converting to OpenVINO Intermediate Representation (OpenVINO IR) format using Model Optimizer, you can now read the Paddle model directly without conversion.

Import

# Downloading model
from pathlib import Path
import os
import urllib.request
import tarfile

# Inference
from openvino.runtime import Core

# Preprocessing
import cv2
import numpy as np
from openvino.preprocess import PrePostProcessor, ResizeAlgorithm
from openvino.runtime import Layout, Type, AsyncInferQueue, PartialShape

# Visualization of the results
import time
import json
from IPython.display import Image

Download the MobileNetV3_large_x1_0 Model

Download the pre-trained model directly from the server. For more detailed information about the pre-trained model, refer to the PaddleClas documentation.

mobilenet_url = "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileNetV3_large_x1_0_infer.tar"
mobilenetv3_model_path = Path("model/MobileNetV3_large_x1_0_infer/inference.pdmodel")
if mobilenetv3_model_path.is_file():
    print("Model MobileNetV3_large_x1_0 already exists")
else:
    # Download the model from the server, and untar it.
    print("Downloading the MobileNetV3_large_x1_0_infer model (20Mb)... May take a while...")
    # Create a directory.
    os.makedirs("model")
    urllib.request.urlretrieve(mobilenet_url, "model/MobileNetV3_large_x1_0_infer.tar")
    print("Model Downloaded")

    file = tarfile.open("model/MobileNetV3_large_x1_0_infer.tar")
    res = file.extractall("model")
    file.close()
    if (not res):
        print(f"Model Extracted to {mobilenetv3_model_path}.")
    else:
        print("Error Extracting the model. Please check the network.")
Downloading the MobileNetV3_large_x1_0_infer model (20Mb)... May take a while...
Model Downloaded
Model Extracted to model/MobileNetV3_large_x1_0_infer/inference.pdmodel.

Define the callback function for postprocessing

def callback(infer_request, info) -> None:
    """
    Define the callback function for postprocessing

    :param: infer_request: the infer_request object
            info: a tuple includes the submitting time of infer request and iteration of inference
    :retuns:
            None
    """
    global total_time
    submit_time, i = info
    imagenet_classes = json.loads(open("utils/imagenet_class_index.json").read())
    predictions = next(iter(infer_request.results.values()))
    indices = np.argsort(-predictions[0])
    if (i == 0):
        # Calculate the first inference time
        first_latency = (time.time() - submit_time) * 1000
        print("first inference latency: {:.2f} ms".format(first_latency))
        for n in range(5):
            print(
                "class name: {}, probability: {:.5f}"
                .format(imagenet_classes[str(list(indices)[n])][1], predictions[0][list(indices)[n]])
            )
        total_time = total_time + first_latency
    else:
        latency = (time.time() - submit_time) * 1000
        total_time = total_time + latency

Read the model

OpenVINO Runtime reads the PaddlePaddle model directly.

# Intialize OpenVINO Runtime with Core()
ie = Core()
# MobileNetV3_large_x1_0
model = ie.read_model(mobilenetv3_model_path)
# Get the information of input and output layers.
input_layer = model.input(0)
output_layer = model.output(0)

Integrate preprocessing steps into the execution graph with Preprocessing API

If your input data does not fit perfectly in the model input tensor, additional operations/steps are needed to transform the data to a format expected by the model. These operations are known as “preprocessing”. Preprocessing steps are integrated into the execution graph and performed on the selected device(s) (CPU/GPU/VPU/etc.) rather than always executed on CPU. This improves utilization on the selected device(s).

For more information, refer to the overview of Preprocessing API

filename = "../001-hello-world/data/coco.jpg"
test_image = cv2.imread(filename)
test_image = np.expand_dims(test_image, 0) / 255
_, h, w, _ = test_image.shape

# Adjust model input shape to improve the performance.
model.reshape({input_layer.any_name: PartialShape([1, 3, 224, 224])})
ppp = PrePostProcessor(model)
# Set input tensor information:
# - The `input()` function provides information about a single model input.
# - Layout of data is "NHWC".
# - Set static spatial dimensions to input tensor to resize from.
ppp.input().tensor() \
    .set_spatial_static_shape(h, w) \
    .set_layout(Layout("NHWC"))
inputs = model.inputs
# Here, it is assumed that the model has "NCHW" layout for input.
ppp.input().model().set_layout(Layout("NCHW"))
# Do prepocessing:
# - Apply linear resize from tensor spatial dims to model spatial dims.
# - Subtract mean from each channel.
# - Divide each pixel data to appropriate scale value.
ppp.input().preprocess() \
    .resize(ResizeAlgorithm.RESIZE_LINEAR, 224, 224) \
    .mean([0.485, 0.456, 0.406]) \
    .scale([0.229, 0.224, 0.225])
# Set output tensor information:
# - Precision of a tensor is supposed to be 'f32'.
ppp.output().tensor().set_element_type(Type.f32)
# Apply preprocessing to modify the original 'model'.
model = ppp.build()

Run Inference

Use Auto Device (or AUTO in short) plugin as the device name to delegate device selection to OpenVINO. AUTO internally recognizes and selects devices from among Intel CPU and GPU depending on the device capabilities and the characteristics of the model(s) (for example, precision). Then, it assigns inference requests to the best device. AUTO starts inference immediately on the CPU and then transparently shifts to the GPU (or VPU) once it is ready, dramatically reducing time to first inference.

total_time = 0
# Check the available devices in your system.
devices = ie.available_devices
for device in devices:
    device_name = ie.get_property(device, "FULL_DEVICE_NAME")
    print(f"{device}: {device_name}")

# Load the model to a device selected by AUTO from the available devices list.
compiled_model = ie.compile_model(model=model, device_name="AUTO")
# Create an infer request queue.
infer_queue = AsyncInferQueue(compiled_model)
infer_queue.set_callback(callback)
start = time.time()
# Do inference.
infer_queue.start_async({input_layer.any_name: test_image}, (time.time(), 0))
infer_queue.wait_all()
Image(filename=filename)
CPU: Intel(R) Core(TM) i9-10920X CPU @ 3.50GHz
first inference latency: 17.77 ms
class name: Labrador_retriever, probability: 0.59148
class name: flat-coated_retriever, probability: 0.11678
class name: Staffordshire_bullterrier, probability: 0.04089
class name: Newfoundland, probability: 0.02689
class name: Tibetan_mastiff, probability: 0.01735
../_images/214-vision-paddle-classification-with-output_12_1.jpg

Performance Hints: Latency and Throughput

Throughput and latency are some of the most widely used metrics that measure the overall performance of an application.

  • Latency measures inference time (ms) required to process a single input or First inference.

  • To calculate throughput, divide number of inputs that were processed by the processing time.

High-level Performance Hints in OpenVINO are the new way to configure the performance with the portability in mind. Performance Hints will let the device configure itself, rather than map the application needs to the low-level performance settings, and keep an associated application logic to configure each possible device separately.

For more information, see High-level Performance Hints.

Run Inference with “LATENCY” Performance Hint

It is possible to define application-specific performance settings with a config key, letting the device adjust to achieve better LATENCY oriented performance.

loop = 100
total_time = 0
# AUTO sets device config based on hints.
compiled_model = ie.compile_model(model=model, device_name="AUTO", config={"PERFORMANCE_HINT": "LATENCY"})
infer_queue = AsyncInferQueue(compiled_model)
# Implement AsyncInferQueue Python API to boost the performance in Async mode.
infer_queue.set_callback(callback)
# Run inference for 100 times to get the average FPS.
start_time = time.time()
for i in range(loop):
    infer_queue.start_async({input_layer.any_name: test_image}, (time.time(), i))
infer_queue.wait_all()
end_time = time.time()
# Calculate the average FPS\n",
fps = loop / (end_time - start_time)
print("throughput: {:.2f} fps".format(fps))
print("average latency: {:.2f} ms".format(total_time / loop))
first inference latency: 12.99 ms
class name: Labrador_retriever, probability: 0.59148
class name: flat-coated_retriever, probability: 0.11678
class name: Staffordshire_bullterrier, probability: 0.04089
class name: Newfoundland, probability: 0.02689
class name: Tibetan_mastiff, probability: 0.01735
throughput: 152.60 fps
average latency: 11.38 ms

Run Inference with “TRHOUGHPUT” Performance Hint

It is possible to define application-specific performance settings with a config key, letting the device adjust to achieve better THROUGHPUT performance.

total_time = 0
# AUTO sets device config based on hints.
compiled_model = ie.compile_model(model=model, device_name="AUTO", config={"PERFORMANCE_HINT": "THROUGHPUT"})
infer_queue = AsyncInferQueue(compiled_model)
infer_queue.set_callback(callback)
start_time = time.time()
for i in range(loop):
    infer_queue.start_async({input_layer.any_name: test_image}, (time.time(), i))
infer_queue.wait_all()
end_time = time.time()
# Calculate the average FPS\n",
fps = loop / (end_time - start_time)
print("throughput: {:.2f} fps".format(fps))
print("average latency: {:.2f} ms".format(total_time / loop))
first inference latency: 16.69 ms
class name: Labrador_retriever, probability: 0.59148
class name: flat-coated_retriever, probability: 0.11678
class name: Staffordshire_bullterrier, probability: 0.04089
class name: Newfoundland, probability: 0.02689
class name: Tibetan_mastiff, probability: 0.01735
throughput: 403.16 fps
average latency: 14.74 ms

Measure Performance with benchmark_app

To generate more accurate performance measurements, use Benchmark Tool in OpenVINO.

You can trigger the “Performance hint” by using -hint parameter, which instructs the OpenVINO device plugin to use the best network-specific settings for latency OR throughput.

NOTE: The performance results from benchmark_app exclude “compilation and load time” of a model.

# 'latency': device performance optimized for LATENCY.
! benchmark_app -m $mobilenetv3_model_path -data_shape [1,3,224,224] -hint "latency"
[Step 1/11] Parsing and validating input arguments
[ WARNING ]  -nstreams default value is determined automatically for a device. Although the automatic selection usually provides a reasonable performance, but it still may be non-optimal for some cases, for more information look at README.
[Step 2/11] Loading OpenVINO
[ INFO ] OpenVINO:
         API version............. 2022.2.0-7713-af16ea1d79a-releases/2022/2
[ INFO ] Device info
         CPU
         openvino_intel_cpu_plugin version 2022.2
         Build................... 2022.2.0-7713-af16ea1d79a-releases/2022/2

[Step 3/11] Setting device configuration
[Step 4/11] Reading network files
[ INFO ] Read model took 57.06 ms
[Step 5/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size: ?
[Step 6/11] Configuring input of the model
[ INFO ] Model input 'inputs' precision u8, dimensions ([N,C,H,W]): ? 3 224 224
[ INFO ] Model output 'save_infer_model/scale_0.tmp_1' precision f32, dimensions ([...]): ? 1000
[Step 7/11] Loading the model to the device
[ INFO ] Compile model took 117.73 ms
[Step 8/11] Querying optimal runtime parameters
[ INFO ] DEVICE: CPU
[ INFO ]   AVAILABLE_DEVICES  , ['']
[ INFO ]   RANGE_FOR_ASYNC_INFER_REQUESTS  , (1, 1, 1)
[ INFO ]   RANGE_FOR_STREAMS  , (1, 24)
[ INFO ]   FULL_DEVICE_NAME  , Intel(R) Core(TM) i9-10920X CPU @ 3.50GHz
[ INFO ]   OPTIMIZATION_CAPABILITIES  , ['WINOGRAD', 'FP32', 'FP16', 'INT8', 'BIN', 'EXPORT_IMPORT']
[ INFO ]   CACHE_DIR  ,
[ INFO ]   NUM_STREAMS  , 1
[ INFO ]   AFFINITY  , Affinity.CORE
[ INFO ]   INFERENCE_NUM_THREADS  , 0
[ INFO ]   PERF_COUNT  , False
[ INFO ]   INFERENCE_PRECISION_HINT  , <Type: 'float32'>
[ INFO ]   PERFORMANCE_HINT  , PerformanceMode.LATENCY
[ INFO ]   PERFORMANCE_HINT_NUM_REQUESTS  , 0
[Step 9/11] Creating infer requests and preparing input data
[ INFO ] Create 1 infer requests took 0.08 ms
[ WARNING ] No input files were given for input 'inputs'!. This input will be filled with random values!
[ INFO ] Fill input 'inputs' with random values
[Step 10/11] Measuring performance (Start inference asynchronously, 1 inference requests, inference only: False, limits: 60000 ms duration)
[ INFO ] Benchmarking in full mode (inputs filling are included in measurement loop).
[ INFO ] First inference took 18.01 ms
[Step 11/11] Dumping statistics report
Count:          32310 iterations
Duration:       60002.63 ms
Latency:
    AVG:        1.79 ms
    MIN:        1.66 ms
    MAX:        3.19 ms
Throughput: 538.48 FPS
# 'throughput' or 'tput': device performance optimized for THROUGHPUT.
! benchmark_app -m $mobilenetv3_model_path -data_shape [1,3,224,224] -hint "throughput"
[Step 1/11] Parsing and validating input arguments
[ WARNING ]  -nstreams default value is determined automatically for a device. Although the automatic selection usually provides a reasonable performance, but it still may be non-optimal for some cases, for more information look at README.
[Step 2/11] Loading OpenVINO
[ INFO ] OpenVINO:
         API version............. 2022.2.0-7713-af16ea1d79a-releases/2022/2
[ INFO ] Device info
         CPU
         openvino_intel_cpu_plugin version 2022.2
         Build................... 2022.2.0-7713-af16ea1d79a-releases/2022/2

[Step 3/11] Setting device configuration
[Step 4/11] Reading network files
[ INFO ] Read model took 28.19 ms
[Step 5/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size: ?
[Step 6/11] Configuring input of the model
[ INFO ] Model input 'inputs' precision u8, dimensions ([N,C,H,W]): ? 3 224 224
[ INFO ] Model output 'save_infer_model/scale_0.tmp_1' precision f32, dimensions ([...]): ? 1000
[Step 7/11] Loading the model to the device
[ INFO ] Compile model took 170.74 ms
[Step 8/11] Querying optimal runtime parameters
[ INFO ] DEVICE: CPU
[ INFO ]   AVAILABLE_DEVICES  , ['']
[ INFO ]   RANGE_FOR_ASYNC_INFER_REQUESTS  , (1, 1, 1)
[ INFO ]   RANGE_FOR_STREAMS  , (1, 24)
[ INFO ]   FULL_DEVICE_NAME  , Intel(R) Core(TM) i9-10920X CPU @ 3.50GHz
[ INFO ]   OPTIMIZATION_CAPABILITIES  , ['WINOGRAD', 'FP32', 'FP16', 'INT8', 'BIN', 'EXPORT_IMPORT']
[ INFO ]   CACHE_DIR  ,
[ INFO ]   NUM_STREAMS  , 1
[ INFO ]   AFFINITY  , Affinity.CORE
[ INFO ]   INFERENCE_NUM_THREADS  , 0
[ INFO ]   PERF_COUNT  , False
[ INFO ]   INFERENCE_PRECISION_HINT  , <Type: 'float32'>
[ INFO ]   PERFORMANCE_HINT  , PerformanceMode.THROUGHPUT
[ INFO ]   PERFORMANCE_HINT_NUM_REQUESTS  , 0
[Step 9/11] Creating infer requests and preparing input data
[ INFO ] Create 6 infer requests took 1.71 ms
[ WARNING ] No input files were given for input 'inputs'!. This input will be filled with random values!
[ INFO ] Fill input 'inputs' with random values
[Step 10/11] Measuring performance (Start inference asynchronously, 6 inference requests, inference only: False, limits: 60000 ms duration)
[ INFO ] Benchmarking in full mode (inputs filling are included in measurement loop).
[ INFO ] First inference took 20.69 ms
[Step 11/11] Dumping statistics report
Count:          80392 iterations
Duration:       60004.12 ms
Latency:
    AVG:        4.27 ms
    MIN:        2.57 ms
    MAX:        37.93 ms
Throughput: 1339.77 FPS