Throughput Benchmark C++ Sample¶
This sample demonstrates how to estimate performance of a model using Asynchronous Inference Request API in throughput mode. Unlike demos this sample doesn’t have other configurable command line arguments. Feel free to modify sample’s source code to try out different options.
The reported results may deviate from what benchmark_app reports. One example is model input precision for computer vision tasks. benchmark_app sets uint8
, while the sample uses default model precision which is usually float32
.
Options |
Values |
---|---|
Validated Models |
|
Model Format |
OpenVINO™ toolkit Intermediate Representation (*.xml + *.bin), ONNX (*.onnx) |
Supported devices |
|
Other language realization |
The following C++ API is used in the application:
Feature |
API |
Description |
---|---|---|
OpenVINO Runtime Version |
|
Get Openvino API version. |
Basic Infer Flow |
|
Common API to do inference: compile a model, create an infer request, configure input tensors. |
Asynchronous Infer |
|
Do asynchronous inference with callback. |
Model Operations |
|
Get inputs of a model. |
Tensor Operations |
|
Get a tensor shape and its data. |
// Copyright (C) 2022 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#include <algorithm>
#include <condition_variable>
#include <string>
#include <vector>
// clang-format off
#include "openvino/openvino.hpp"
#include "samples/args_helper.hpp"
#include "samples/common.hpp"
#include "samples/latency_metrics.hpp"
#include "samples/slog.hpp"
// clang-format on
using Ms = std::chrono::duration<double, std::ratio<1, 1000>>;
int main(int argc, char* argv[]) {
try {
slog::info << "OpenVINO:" << slog::endl;
slog::info << ov::get_openvino_version();
if (argc != 2) {
slog::info << "Usage : " << argv[0] << " <path_to_model>" << slog::endl;
return EXIT_FAILURE;
}
// Optimize for throughput. Best throughput can be reached by
// running multiple ov::InferRequest instances asyncronously
ov::AnyMap tput{{ov::hint::performance_mode.name(), ov::hint::PerformanceMode::THROUGHPUT}};
// Create ov::Core and use it to compile a model.
// Pick a device by replacing CPU, for example MULTI:CPU(4),GPU(8).
// It is possible to set CUMULATIVE_THROUGHPUT as ov::hint::PerformanceMode for AUTO device
ov::Core core;
ov::CompiledModel compiled_model = core.compile_model(argv[1], "CPU", tput);
// Create optimal number of ov::InferRequest instances
uint32_t nireq = compiled_model.get_property(ov::optimal_number_of_infer_requests);
std::vector<ov::InferRequest> ireqs(nireq);
std::generate(ireqs.begin(), ireqs.end(), [&] {
return compiled_model.create_infer_request();
});
// Fill input data for ireqs
for (ov::InferRequest& ireq : ireqs) {
for (const ov::Output<const ov::Node>& model_input : compiled_model.inputs()) {
fill_tensor_random(ireq.get_tensor(model_input));
}
}
// Warm up
for (ov::InferRequest& ireq : ireqs) {
ireq.start_async();
}
for (ov::InferRequest& ireq : ireqs) {
ireq.wait();
}
// Benchmark for seconds_to_run seconds and at least niter iterations
std::chrono::seconds seconds_to_run{10};
size_t niter = 10;
std::vector<double> latencies;
std::mutex mutex;
std::condition_variable cv;
std::exception_ptr callback_exception;
struct TimedIreq {
ov::InferRequest& ireq; // ref
std::chrono::steady_clock::time_point start;
bool has_start_time;
};
std::deque<TimedIreq> finished_ireqs;
for (ov::InferRequest& ireq : ireqs) {
finished_ireqs.push_back({ireq, std::chrono::steady_clock::time_point{}, false});
}
auto start = std::chrono::steady_clock::now();
auto time_point_to_finish = start + seconds_to_run;
// Once there’s a finished ireq wake up main thread.
// Compute and save latency for that ireq and prepare for next inference by setting up callback.
// Callback pushes that ireq again to finished ireqs when infrence is completed.
// Start asynchronous infer with updated callback
for (;;) {
std::unique_lock<std::mutex> lock(mutex);
while (!callback_exception && finished_ireqs.empty()) {
cv.wait(lock);
}
if (callback_exception) {
std::rethrow_exception(callback_exception);
}
if (!finished_ireqs.empty()) {
auto time_point = std::chrono::steady_clock::now();
if (time_point > time_point_to_finish && latencies.size() > niter) {
break;
}
TimedIreq timedIreq = finished_ireqs.front();
finished_ireqs.pop_front();
lock.unlock();
ov::InferRequest& ireq = timedIreq.ireq;
if (timedIreq.has_start_time) {
latencies.push_back(std::chrono::duration_cast<Ms>(time_point - timedIreq.start).count());
}
ireq.set_callback(
[&ireq, time_point, &mutex, &finished_ireqs, &callback_exception, &cv](std::exception_ptr ex) {
// Keep callback small. This improves performance for fast (tens of thousands FPS) models
std::unique_lock<std::mutex> lock(mutex);
{
try {
if (ex) {
std::rethrow_exception(ex);
}
finished_ireqs.push_back({ireq, time_point, true});
} catch (const std::exception&) {
if (!callback_exception) {
callback_exception = std::current_exception();
}
}
}
cv.notify_one();
});
ireq.start_async();
}
}
auto end = std::chrono::steady_clock::now();
double duration = std::chrono::duration_cast<Ms>(end - start).count();
// Report results
slog::info << "Count: " << latencies.size() << " iterations" << slog::endl
<< "Duration: " << duration << " ms" << slog::endl
<< "Latency:" << slog::endl;
size_t percent = 50;
LatencyMetrics{latencies, "", percent}.write_to_slog();
slog::info << "Throughput: " << double_to_string(1000 * latencies.size() / duration) << " FPS" << slog::endl;
} catch (const std::exception& ex) {
slog::err << ex.what() << slog::endl;
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}
How It Works¶
The sample compiles a model for a given device, randomly generates input data, performs asynchronous inference multiple times for a given number of seconds. Then processes and reports performance results.
You can see the explicit description of each sample step at Integration Steps section of “Integrate OpenVINO™ Runtime with Your Application” guide.
Building¶
To build the sample, please use instructions available at Build the Sample Applications section in OpenVINO™ Toolkit Samples guide.
Running¶
throughput_benchmark <path_to_model>
To run the sample, you need to specify a model:
You can use public or Intel’s pre-trained models from the Open Model Zoo. The models can be downloaded using the Model Downloader.
Note
Before running the sample with a trained model, make sure the model is converted to the intermediate representation (IR) format (*.xml + *.bin) using the model conversion API.
The sample accepts models in ONNX format (.onnx) that do not require preprocessing.
Example¶
Install the
openvino-dev
Python package to use Open Model Zoo Tools:python -m pip install openvino-dev[caffe]
Download a pre-trained model using:
omz_downloader --name googlenet-v1
If a model is not in the IR or ONNX format, it must be converted. You can do this using the model converter:
omz_converter --name googlenet-v1
Perform benchmarking using the
googlenet-v1
model on aCPU
:throughput_benchmark googlenet-v1.xml
Sample Output¶
The application outputs performance results.
[ INFO ] OpenVINO:
[ INFO ] Build ................................. <version>
[ INFO ] Count: 1577 iterations
[ INFO ] Duration: 15024.2 ms
[ INFO ] Latency:
[ INFO ] Median: 38.02 ms
[ INFO ] Average: 38.08 ms
[ INFO ] Min: 25.23 ms
[ INFO ] Max: 49.16 ms
[ INFO ] Throughput: 104.96 FPS