Throughput Benchmark C++ Sample

This sample demonstrates how to estimate performace of a model using Asynchronous Inference Request API in throughput mode. Unlike demos this sample doesn’t have other configurable command line arguments. Feel free to modify sample’s source code to try out different options.

The reported results may deviate from what benchmark_app reports. One example is model input precision for computer vision tasks. benchmark_app sets uint8, while the sample uses default model precision which is usually float32.

The following C++ API is used in the application:




OpenVINO Runtime Version


Get Openvino API version

Basic Infer Flow

ov::Core , ov::Core::compile_model , ov::CompiledModel::create_infer_request , ov::InferRequest::get_tensor

Common API to do inference: compile a model, create an infer request, configure input tensors

Asynchronous Infer

ov::InferRequest::start_async , ov::InferRequest::set_callback

Do asynchronous inference with callback.

Model Operations


Get inputs of a model

Tensor Operations

ov::Tensor::get_shape , ov::Tensor::data

Get a tensor shape and its data.



Validated Models

alexnet, googlenet-v1, yolo-v3-tf , face-detection-0200

Model Format

OpenVINO™ toolkit Intermediate Representation (*.xml + *.bin), ONNX (*.onnx)

Supported devices


Other language realization


How It Works

The sample compiles a model for a given device, randomly generates input data, performs asynchronous inference multiple times for a given number of seconds. Then processes and reports performance results.

You can see the explicit description of each sample step at Integration Steps section of “Integrate OpenVINO™ Runtime with Your Application” guide.


To build the sample, please use instructions available at Build the Sample Applications section in OpenVINO™ Toolkit Samples guide.


throughput_benchmark <path_to_model>

To run the sample, you need to specify a model:


  • Before running the sample with a trained model, make sure the model is converted to the intermediate representation (IR) format (*.xml + *.bin) using the Model Optimizer tool.

  • The sample accepts models in ONNX format (.onnx) that do not require preprocessing.


  1. Install the openvino-dev Python package to use Open Model Zoo Tools:

    python -m pip install openvino-dev[caffe]
  2. Download a pre-trained model using:

    omz_downloader --name googlenet-v1
  3. If a model is not in the IR or ONNX format, it must be converted. You can do this using the model converter:

    omz_converter --name googlenet-v1
  4. Perform benchmarking using the googlenet-v1 model on a CPU:

    throughput_benchmark googlenet-v1.xml

Sample Output

The application outputs performance results.

[ INFO ] OpenVINO:
[ INFO ] Build ................................. <version>
[ INFO ] Count:      1577 iterations
[ INFO ] Duration:   15024.2 ms
[ INFO ] Latency:
[ INFO ]        Median:     38.02 ms
[ INFO ]        Average:    38.08 ms
[ INFO ]        Min:        25.23 ms
[ INFO ]        Max:        49.16 ms
[ INFO ] Throughput: 104.96 FPS