This topic demonstrates how to use the Benchmark Application to estimate deep learning inference performance on supported devices. Performance can be measured for two inference modes: synchronous and asynchronous.
NOTE: This topic describes usage of C++ implementation of the Benchmark Application. For the Python* implementation, refer to ./samples/python_samples/benchmark_app/README.md "Benchmark Application (Python*)"
NOTE: To achieve benchmark results similar to the official published results, set CPU frequency to 2.9GHz and GPU frequency to 1GHz.
Upon the start-up, the application reads command-line parameters and loads a network and images to the Inference Engine plugin. The number of infer requests and execution approach depend on a mode defined with the -api
command-line parameter.
For synchronous mode, the primary metric is latency. The application creates one infer request and executes the Infer
method. A number of executions is defined by one of the two values:
-niter
command-line argument-niter
is skipped. Predefined duration value depends on device.During the execution, the application collects two types of metrics:
Infer
methodReported latency value is calculated as mean value of all collected latencies. Reported throughput value is a derivative from reported latency and additionally depends on batch size.
For asynchronous mode, the primary metric is throughput in frames per second (FPS). The application creates a certain number of infer requests and executes the StartAsync
method. A number of infer is specified with the -nireq
command-line parameter. A number of executions is defined by one of the two values:
-niter
command-line argument-niter
is skipped. Predefined duration value depends on device.The infer requests are executed asynchronously. Wait
method is used to wait for previous execution to complete. The application measures all infer requests executions and reports the throughput metric based on batch size and total execution duration.
Running the application with the -h
option yields the following usage message:
Running the application with the empty list of options yields the usage message given above and an error message.
You can run the application for one input layer four-dimensional models that support images as input, for example, public AlexNet and GoogLeNet models that can be downloaded with the OpenVINO Model Downloader.
NOTE: To run the application, the model should be first converted to the Inference Engine format (*.xml + *.bin)
using the Model Optimizer tool.
For example, to perform inference on CPU in the synchronous mode and get estimated performance metrics for AlexNet model, run the following command:
For the asynchronous mode:
Application output depends on a used API. For synchronous API, the application outputs latency and throughput:
For asynchronous API, the application outputs only throughput: