Sync Benchmark C++ Sample

This sample demonstrates how to estimate performance of a model using Synchronous Inference Request API. It makes sense to use synchronous inference only in latency oriented scenarios. Models with static input shapes are supported. Unlike demos this sample doesn’t have other configurable command line arguments. Feel free to modify sample’s source code to try out different options.

Options

Values

Validated Models

alexnet, googlenet-v1, yolo-v3-tf, face-detection-0200

Model Format

OpenVINO™ toolkit Intermediate Representation (*.xml + *.bin), ONNX (*.onnx)

Supported devices

All

Other language realization

Python

Feature

API

Description

OpenVINO Runtime Version

ov::get_openvino_version

Get Openvino API version.

Basic Infer Flow

ov::Core, ov::Core::compile_model, ov::CompiledModel::create_infer_request, ov::InferRequest::get_tensor

Common API to do inference: compile a model, create an infer request, configure input tensors.

Synchronous Infer

ov::InferRequest::infer,

Do synchronous inference.

Model Operations

ov::CompiledModel::inputs

Get inputs of a model.

Tensor Operations

ov::Tensor::get_shape, ov::Tensor::data

Get a tensor shape and its data.

// Copyright (C) 2022 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//

#include <string>
#include <vector>

// clang-format off
#include "openvino/openvino.hpp"

#include "samples/args_helper.hpp"
#include "samples/common.hpp"
#include "samples/latency_metrics.hpp"
#include "samples/slog.hpp"
// clang-format on

using Ms = std::chrono::duration<double, std::ratio<1, 1000>>;

int main(int argc, char* argv[]) {
    try {
        slog::info << "OpenVINO:" << slog::endl;
        slog::info << ov::get_openvino_version();
        if (argc != 2) {
            slog::info << "Usage : " << argv[0] << " <path_to_model>" << slog::endl;
            return EXIT_FAILURE;
        }
        // Optimize for latency. Most of the devices are configured for latency by default,
        // but there are exceptions like GNA
        ov::AnyMap latency{{ov::hint::performance_mode.name(), ov::hint::PerformanceMode::LATENCY}};

        // Create ov::Core and use it to compile a model.
        // Pick a device by replacing CPU, for example AUTO:GPU,CPU.
        // Using MULTI device is pointless in sync scenario
        // because only one instance of ov::InferRequest is used
        ov::Core core;
        ov::CompiledModel compiled_model = core.compile_model(argv[1], "CPU", latency);
        ov::InferRequest ireq = compiled_model.create_infer_request();
        // Fill input data for the ireq
        for (const ov::Output<const ov::Node>& model_input : compiled_model.inputs()) {
            fill_tensor_random(ireq.get_tensor(model_input));
        }
        // Warm up
        ireq.infer();
        // Benchmark for seconds_to_run seconds and at least niter iterations
        std::chrono::seconds seconds_to_run{10};
        size_t niter = 10;
        std::vector<double> latencies;
        latencies.reserve(niter);
        auto start = std::chrono::steady_clock::now();
        auto time_point = start;
        auto time_point_to_finish = start + seconds_to_run;
        while (time_point < time_point_to_finish || latencies.size() < niter) {
            ireq.infer();
            auto iter_end = std::chrono::steady_clock::now();
            latencies.push_back(std::chrono::duration_cast<Ms>(iter_end - time_point).count());
            time_point = iter_end;
        }
        auto end = time_point;
        double duration = std::chrono::duration_cast<Ms>(end - start).count();
        // Report results
        slog::info << "Count:      " << latencies.size() << " iterations" << slog::endl
                   << "Duration:   " << duration << " ms" << slog::endl
                   << "Latency:" << slog::endl;
        size_t percent = 50;
        LatencyMetrics{latencies, "", percent}.write_to_slog();
        slog::info << "Throughput: " << double_to_string(latencies.size() * 1000 / duration) << " FPS" << slog::endl;
    } catch (const std::exception& ex) {
        slog::err << ex.what() << slog::endl;
        return EXIT_FAILURE;
    }
    return EXIT_SUCCESS;
}

How It Works

The sample compiles a model for a given device, randomly generates input data, performs synchronous inference multiple times for a given number of seconds. Then processes and reports performance results.

You can see the explicit description of each sample step at Integration Steps section of “Integrate OpenVINO™ Runtime with Your Application” guide.

Building

To build the sample, please use instructions available at Build the Sample Applications section in OpenVINO™ Toolkit Samples guide.

Running

sync_benchmark <path_to_model>

To run the sample, you need to specify a model:

Note

Before running the sample with a trained model, make sure the model is converted to the intermediate representation (IR) format (*.xml + *.bin) using the model conversion API.

The sample accepts models in ONNX format (.onnx) that do not require preprocessing.

Example

  1. Install the openvino-dev Python package to use Open Model Zoo Tools:

    python -m pip install openvino-dev[caffe]
    
  2. Download a pre-trained model using:

    omz_downloader --name googlenet-v1
    
  3. If a model is not in the IR or ONNX format, it must be converted. You can do this using the model converter:

    omz_converter --name googlenet-v1
    
  4. Perform benchmarking using the googlenet-v1 model on a CPU:

    sync_benchmark googlenet-v1.xml
    

Sample Output

The application outputs performance results.

[ INFO ] OpenVINO:
[ INFO ] Build ................................. <version>
[ INFO ] Count:      992 iterations
[ INFO ] Duration:   15009.8 ms
[ INFO ] Latency:
[ INFO ]        Median:     14.00 ms
[ INFO ]        Average:    15.13 ms
[ INFO ]        Min:        9.33 ms
[ INFO ]        Max:        53.60 ms
[ INFO ] Throughput: 66.09 FPS