Sync Benchmark Python Sample

This sample demonstrates how to estimate performance of a model using Synchronous Inference Request API. It makes sense to use synchronous inference only in latency oriented scenarios. Models with static input shapes are supported. Unlike demos this sample doesn’t have other configurable command line arguments. Feel free to modify sample’s source code to try out different options.

Options

Values

Validated Models

alexnet, googlenet-v1, yolo-v3-tf, face-detection-0200

Model Format

OpenVINO™ toolkit Intermediate Representation (*.xml + *.bin), ONNX (*.onnx)

Supported devices

All

Other language realization

C++

The following Python API is used in the application:

Feature

API

Description

OpenVINO Runtime Version

[openvino.runtime.get_version]

Get Openvino API version.

Basic Infer Flow

[openvino.runtime.Core], [openvino.runtime.Core.compile_model], [openvino.runtime.InferRequest.get_tensor]

Common API to do inference: compile a model, configure input tensors.

Synchronous Infer

[openvino.runtime.InferRequest.infer],

Do synchronous inference.

Model Operations

[openvino.runtime.CompiledModel.inputs]

Get inputs of a model.

Tensor Operations

[openvino.runtime.Tensor.get_shape], [openvino.runtime.Tensor.data]

Get a tensor shape and its data.

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# Copyright (C) 2022 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

import logging as log
import statistics
import sys
from time import perf_counter

import numpy as np
from openvino.runtime import Core, get_version
from openvino.runtime.utils.types import get_dtype


def fill_tensor_random(tensor):
    dtype = get_dtype(tensor.element_type)
    rand_min, rand_max = (0, 1) if dtype == bool else (np.iinfo(np.uint8).min, np.iinfo(np.uint8).max)
    # np.random.uniform excludes high: add 1 to have it generated
    if np.dtype(dtype).kind in ['i', 'u', 'b']:
        rand_max += 1
    rs = np.random.RandomState(np.random.MT19937(np.random.SeedSequence(0)))
    if 0 == tensor.get_size():
        raise RuntimeError("Models with dynamic shapes aren't supported. Input tensors must have specific shapes before inference")
    tensor.data[:] = rs.uniform(rand_min, rand_max, list(tensor.shape)).astype(dtype)


def main():
    log.basicConfig(format='[ %(levelname)s ] %(message)s', level=log.INFO, stream=sys.stdout)
    log.info('OpenVINO:')
    log.info(f"{'Build ':.<39} {get_version()}")
    if len(sys.argv) != 2:
        log.info(f'Usage: {sys.argv[0]} <path_to_model>')
        return 1
    # Optimize for latency. Most of the devices are configured for latency by default,
    # but there are exceptions like GNA
    latency = {'PERFORMANCE_HINT': 'LATENCY'}

    # Create Core and use it to compile a model.
    # Pick a device by replacing CPU, for example AUTO:GPU,CPU.
    # Using MULTI device is pointless in sync scenario
    # because only one instance of openvino.runtime.InferRequest is used
    core = Core()
    compiled_model = core.compile_model(sys.argv[1], 'CPU', latency)
    ireq = compiled_model.create_infer_request()
    # Fill input data for the ireq
    for model_input in compiled_model.inputs:
        fill_tensor_random(ireq.get_tensor(model_input))
    # Warm up
    ireq.infer()
    # Benchmark for seconds_to_run seconds and at least niter iterations
    seconds_to_run = 10
    niter = 10
    latencies = []
    start = perf_counter()
    time_point = start
    time_point_to_finish = start + seconds_to_run
    while time_point < time_point_to_finish or len(latencies) < niter:
        ireq.infer()
        iter_end = perf_counter()
        latencies.append((iter_end - time_point) * 1e3)
        time_point = iter_end
    end = time_point
    duration = end - start
    # Report results
    fps = len(latencies) / duration
    log.info(f'Count:          {len(latencies)} iterations')
    log.info(f'Duration:       {duration * 1e3:.2f} ms')
    log.info('Latency:')
    log.info(f'    Median:     {statistics.median(latencies):.2f} ms')
    log.info(f'    Average:    {sum(latencies) / len(latencies):.2f} ms')
    log.info(f'    Min:        {min(latencies):.2f} ms')
    log.info(f'    Max:        {max(latencies):.2f} ms')
    log.info(f'Throughput: {fps:.2f} FPS')


if __name__ == '__main__':
    main()

How It Works

The sample compiles a model for a given device, randomly generates input data, performs synchronous inference multiple times for a given number of seconds. Then processes and reports performance results.

You can see the explicit description of each sample step at Integration Steps section of “Integrate OpenVINO™ Runtime with Your Application” guide.

Running

python sync_benchmark.py <path_to_model>

To run the sample, you need to specify a model:

  • You can use public or doc:Intel’s <omz_models_group_intel> pre-trained models from the Open Model Zoo. The models can be downloaded using the Model Downloader.

Note

Before running the sample with a trained model, make sure the model is converted to the intermediate representation (IR) format (*.xml + *.bin) using the model conversion API.

The sample accepts models in ONNX format (.onnx) that do not require preprocessing.

Example

  1. Install the openvino-dev Python package to use Open Model Zoo Tools:

    python -m pip install openvino-dev[caffe]
    
  2. Download a pre-trained model using:

    omz_downloader --name googlenet-v1
    
  3. If a model is not in the IR or ONNX format, it must be converted. You can do this using the model converter:

    omz_converter --name googlenet-v1
    
  4. Perform benchmarking using the googlenet-v1 model on a CPU:

    python sync_benchmark.py googlenet-v1.xml
    

Sample Output

The application outputs performance results.

[ INFO ] OpenVINO:
[ INFO ] Build ................................. <version>
[ INFO ] Count:          2333 iterations
[ INFO ] Duration:       10003.59 ms
[ INFO ] Latency:
[ INFO ]     Median:     3.90 ms
[ INFO ]     Average:    4.29 ms
[ INFO ]     Min:        3.30 ms
[ INFO ]     Max:        10.11 ms
[ INFO ] Throughput: 233.22 FPS