Performance Information F.A.Q.

How often do performance benchmarks get updated?

New performance benchmarks are typically published on every major.minor release of the Intel® Distribution of OpenVINO™ toolkit.

Where can I find the models used in the performance benchmarks?

All models used are included in the GitHub repository of Open Model Zoo.

Will there be any new models added to the list used for benchmarking?

The models used in the performance benchmarks were chosen based on general adoption and usage in deployment scenarios. New models that support a diverse set of workloads and usage are added periodically.

How can I run the benchmark results on my own?

All of the performance benchmarks are generated using the open-source tool within the Intel® Distribution of OpenVINO™ toolkit called benchmark_app. This tool is available for C++ apps. as well as for Python apps.

For a simple instruction on testing performance, see the Getting Performance Numbers Guide.

Where can I find a more detailed description of the workloads used for benchmarking?

The image size used in inference depends on the benchmarked network. The table below presents the list of input sizes for each network model:

Model

Public Network

Task

Input Size

GPT-2

OpenAI GPT-2

Transformer

1024

bert-base-cased

BERT

question / answer

128

bert-large-uncased-whole-word-masking-squad-int8-0001

BERT-large

question / answer

384

deeplabv3

DeepLab v3 Tf

semantic segmentation

513x513

efficientdet-d0

Efficientdet

classification

512x512

faster_rcnn_resnet50_coco

Faster RCNN Tf

object detection

600x1024

inception-v4

Inception v4 (aka GoogleNet-V4)

classification

299x299

mobilenet-ssd

SSD (MobileNet)_COCO-2017_Caffe

object detection

300x300

mobilenet-v2

Mobilenet V2 PyTorch

classification

224x224

resnet-50

ResNet-50_v1_ILSVRC-2012

classification

224x224

ssd-resnet34-1200-onnx

ssd-resnet34 onnx model

object detection

1200x1200

unet-camvid-onnx-0001

U-Net

semantic segmentation

368x480

yolo-v3

YOLO v3

object detection

416x416

yolo-v3-tiny

YOLO v3 Tiny

object detection

416x416

yolov8n

Yolov8nano

object detection

608x608

Where can I purchase the specific hardware used in the benchmarking?

Intel partners with vendors all over the world. For a list of Hardware Manufacturers, see the Intel® AI: In Production Partners & Solutions Catalog. For more details, see the Supported Devices. documentation. Before purchasing any hardware, you can test and run models remotely, using Intel® DevCloud for the Edge.

How can I optimize my models for better performance or accuracy?

Set of guidelines and recommendations to optimize models are available in the optimization guide. Join the conversation in the Community Forum for further support.

Why are INT8 optimized models used for benchmarking on CPUs with no VNNI support?

The benefit of low-precision optimization using the OpenVINO™ toolkit model optimizer extends beyond processors supporting VNNI through Intel® DL Boost. The reduced bit width of INT8 compared to FP32 allows Intel® CPU to process the data faster. Therefore, it offers better throughput on any converted model, regardless of the intrinsically supported low-precision optimizations within Intel® hardware. For comparison on boost factors for different network models and a selection of Intel® CPU architectures, including AVX-2 with Intel® Core™ i7-8700T, and AVX-512 (VNNI) with Intel® Xeon® 5218T and Intel® Xeon® 8270, refer to the Model Accuracy for INT8 and FP32 Precision

Where can I search for OpenVINO™ performance results based on HW-platforms?

The website format has changed in order to support more common approach of searching for the performance results of a given neural network model on different HW-platforms. As opposed to reviewing performance of a given HW-platform when working with different neural network models.

How is Latency measured?

Latency is measured by running the OpenVINO™ Runtime in synchronous mode. In this mode, each frame or image is processed through the entire set of stages (pre-processing, inference, post-processing) before the next frame or image is processed. This KPI is relevant for applications where the inference on a single image is required. For example, the analysis of an ultra sound image in a medical application or the analysis of a seismic image in the oil & gas industry. Other use cases include real or near real-time applications, e.g. the response of industrial robot to changes in its environment and obstacle avoidance for autonomous vehicles, where a quick response to the result of the inference is required.