This topic demonstrates how to run the Benchmark Python* Tool, which performs inference using convolutional networks. Performance can be measured for two inference modes: synchronous (latency-oriented) and asynchronous (throughput-oriented).
NOTE: This topic describes usage of Python implementation of the Benchmark Tool. For the C++ implementation, refer to Benchmark C++ Tool.
TIP: You also can work with the Benchmark Tool inside the OpenVINO™ Deep Learning Workbench (DL Workbench). DL Workbench is a platform built upon OpenVINO™ and provides a web-based graphical environment that enables you to optimize, fine-tune, analyze, visualize, and compare performance of deep learning models on various Intel® architecture configurations. In the DL Workbench, you can use most of OpenVINO™ toolkit components.
Proceed to an easy installation from Docker to get started.
Upon start-up, the application reads command-line parameters and loads a network and images/binary files to the Inference Engine plugin, which is chosen depending on a specified device. The number of infer requests and execution approach depend on the mode defined with the -api
command-line parameter.
NOTE: By default, Inference Engine samples, tools and demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with
--reverse_input_channels
argument specified. For more information about the argument, refer to When to Reverse Input Channels section of Converting a Model Using General Conversion Parameters.
For synchronous mode, the primary metric is latency. The application creates one infer request and executes the Infer
method. A number of executions is defined by one of the two values:
-niter
command-line argument-t
command-line argument-niter
and -t
are not specified. Predefined duration value depends on device.During the execution, the application collects two types of metrics:
Infer
methodReported latency value is calculated as mean value of all collected latencies. Reported throughput value is a derivative from reported latency and additionally depends on batch size.
For asynchronous mode, the primary metric is throughput in frames per second (FPS). The application creates a certain number of infer requests and executes the StartAsync
method. A number of executions is defined by one of the two values:
-niter
command-line argument-t
command-line argument-niter
and -t
are not specified. Predefined duration value depends on device.The infer requests are executed asynchronously. Callback is used to wait for previous execution to complete. The application measures all infer requests executions and reports the throughput metric based on batch size and total execution duration.
Before running the Benchmark tool, install the requirements:
Notice that the benchmark_app usually produces optimal performance for any device out of the box.
So in most cases you don't need to play the app options explicitly and the plain device name is enough, for example, for CPU:
But it is still may be non-optimal for some cases, especially for very small networks. More details can read in Introduction to Performance Topics.
Running the application with the -h
or --help
' option yields the following usage message:
Running the application with the empty list of options yields the usage message given above and an error message.
Application supports topologies with one or more inputs. If a topology is not data sensitive, you can skip the input parameter. In this case, inputs are filled with random values. If a model has only image input(s), please a provide folder with images or a path to an image as input. If a model has some specific input(s) (not images), please prepare a binary file(s), which is filled with data of appropriate precision and provide a path to them as input. If a model has mixed input types, input folder should contain all required files. Image inputs are filled with image files one by one. Binary inputs are filled with binary inputs one by one.
To run the tool, you can use public or Intel's pre-trained models. To download the models, use the OpenVINO Model Downloader or go to https://download.01.org/opencv/.
NOTE: Before running the tool with a trained model, make sure the model is converted to the Inference Engine format (*.xml + *.bin) using the Model Optimizer tool.
This section provides step-by-step instructions on how to run the Benchmark Tool with the googlenet-v1
public model on CPU or FPGA devices. As an input, the car.png
file from the <INSTALL_DIR>/deployment_tools/demo/
directory is used.
NOTE: The Internet access is required to execute the following steps successfully. If you have access to the Internet through the proxy server only, please make sure that it is configured in your OS environment.
downloader.py
script with specifying the model name and directory to download the model to: mo.py
script with specifying the path to the model, model format (which must be FP32 for CPU and FPG) and output directory to generate the IR files: <INSTALL_DIR>/deployment_tools/demo/car.png
file as an input image, the IR of the googlenet-v1
model and a device to perform inference on. The following commands demonstrate running the Benchmark Tool in the asynchronous mode on CPU and FPGA devices:The application outputs number of executed iterations, total duration of execution, latency and throughput. Additionally, if you set the -pc
parameter, the application outputs performance counters. If you set -exec_graph_path
, the application reports executable graph information serialized.
Below are fragments of sample output for CPU and FPGA devices: