Using AI Accelerators#
Prepare test model#
Download ResNet50 model
mkdir models
docker run -u $(id -u):$(id -g) -v ${PWD}/models:/models openvino/ubuntu20_dev:2024.6.0 omz_downloader --name resnet-50-tf --output_dir /models
docker run -u $(id -u):$(id -g) -v ${PWD}/models:/models:rw openvino/ubuntu20_dev:2024.6.0 omz_converter --name resnet-50-tf --download_dir /models --output_dir /models --precisions FP32
mv ${PWD}/models/public/resnet-50-tf/FP32 ${PWD}/models/public/resnet-50-tf/1
Starting a Docker Container with Intel integrated GPU, Intel® Data Center GPU Flex Series and Intel® Arc™ GPU#
The GPU plugin uses the Intel Compute Library for Deep Neural Networks (clDNN) to infer deep neural networks. For inference execution, it employs Intel® Processor Graphics including Intel® HD Graphics, Intel® Iris® Graphics, Intel® Iris® Xe Graphics, and Intel® Iris® Xe MAX graphics.
Before using GPU as OpenVINO Model Server target device, you need to:
install the required drivers - refer to OpenVINO installation guide
start the docker container with the additional parameter of
--device /dev/dri
to pass the device contextset the parameter of
--target_device
toGPU
.use the
openvino/model_server:latest-gpu
image, which contains GPU dependencies
Running inference on GPU requires the model server process security context account to have correct permissions. It must belong to the render group identified by the command:
stat -c "group_name=%G group_id=%g" /dev/dri/render*
The default account in the docker image is preconfigured. If you change the security context, use the following command to start the model server container:
docker run --rm -it --device=/dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) \
-v ${PWD}/models/public/resnet-50-tf:/opt/model -p 9001:9001 openvino/model_server:latest-gpu \
--model_path /opt/model --model_name resnet --port 9001 --target_device GPU
GPU device can be used also on Windows hosts with Windows Subsystem for Linux 2 (WSL2). In such scenario, there are needed extra docker parameters. See the command below.
Use device /dev/dxg
instead of /dev/dri
and mount the volume /usr/lib/wsl
:
docker run --rm -it --device=/dev/dxg --volume /usr/lib/wsl:/usr/lib/wsl -u $(id -u):$(id -g) \
-v ${PWD}/models/public/resnet-50-tf:/opt/model -p 9001:9001 openvino/model_server:latest-gpu \
--model_path /opt/model --model_name resnet --port 9001 --target_device GPU
NOTE: The public docker image includes the OpenCL drivers for GPU in version 22.28 (RedHat) and 22.35 (Ubuntu).
If you need to build the OpenVINO Model Server with different driver version, refer to the building from sources
Using Multi-Device Plugin#
If you have multiple inference devices available (e.g. GPU, CPU, and NPU) you can increase inference throughput by enabling the Multi-Device Plugin. It distributes Inference requests among multiple devices, balancing out the load. For more detailed information read OpenVINO’s Multi-Device plugin documentation documentation.
To use this feature in OpenVINO Model Server, you can choose one of two ways:
Use a .json configuration file to set the
--target_device
parameter with the pattern of:MULTI:<DEVICE_1>,<DEVICE_2>
. The order of the devices will define their priority, in this case makingdevice_1
the primary selection.
This example of a config.json file sets up the Multi-Device Plugin for a resnet model, using GPU and CPU as devices:
echo '{"model_config_list": [
{"config": {
"name": "resnet",
"base_path": "/opt/model",
"batch_size": "1",
"target_device": "MULTI:GPU,CPU"}
}]
}' >> models/public/resnet-50-tf/config.json
To start OpenVINO Model Server, with the described config file placed as ./models/config.json
, set the grpc_workers
parameter to match the nireq
field in config.json
and use the run command, like so:
docker run -d --rm --device=/dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) \
-u $(id -u):$(id -g) -v ${PWD}/models/public/resnet-50-tf/:/opt/model:ro -p 9001:9001 \
openvino/model_server:latest-gpu --config_path /opt/model/config.json --port 9001
When using just a single model, you can start OpenVINO Model Server without the config.json file. To do so, use the run command together with additional parameters, like so:
docker run -d --rm --device=/dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) \
-u $(id -u):$(id -g) -v ${PWD}/models/public/resnet-50-tf/:/opt/model:ro -p 9001:9001 \
openvino/model_server:latest-gpu --model_path /opt/model --model_name resnet --port 9001 --target_device 'MULTI:GPU,CPU'
The deployed model will perform inference on both GPU and CPU. The total throughput will be roughly equal to the sum of GPU and CPU throughput.
Using Heterogeneous Plugin#
The HETERO plugin makes it possible to distribute inference load of one model among several computing devices. That way different parts of the deep learning network can be executed by devices best suited to their type of calculations. OpenVINO automatically divides the network to optimize the process.
The Heterogeneous plugin can be configured using the --target_device
parameter with the pattern of: HETERO:<DEVICE_1>,<DEVICE_2>
.
The order of devices will define their priority, in this case making device_1
the primary and device_2
the fallback one.
Here is a config example using heterogeneous plugin with GPU as the primary device and CPU as a fallback.
echo '{"model_config_list": [
{"config": {
"name": "resnet",
"base_path": "/opt/model",
"batch_size": "1",
"target_device": "HETERO:GPU,CPU"}
}]
}' >> models/public/resnet-50-tf/config.json
Using AUTO Plugin#
Auto Device (or AUTO in short) is a new special “virtual” or “proxy” device in the OpenVINO toolkit, it doesn’t bind to a specific type of HW device.
AUTO solves the complexity in application required to code a logic for the HW device selection (through HW devices) and then, on the deducing the best optimization settings on that device.
AUTO always chooses the best device, if compiling model fails on this device, AUTO will try to compile it on next best device until one of them succeeds.
Make sure you have passed the devices and access to the devices you want to use in for the docker image. For example with:
--device=/dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g)
Below is an example of the command with AUTO Plugin as target device. It includes extra docker parameters to enable GPU (/dev/dri) , beside CPU.
docker run --rm -d --device=/dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) \
-u $(id -u):$(id -g) -v ${PWD}/models/public/resnet-50-tf:/opt/model -p 9001:9001 openvino/model_server:latest-gpu \
--model_path /opt/model --model_name resnet --port 9001 \
--target_device AUTO
The Auto Device
plugin can also use the PERFORMANCE_HINT plugin config property that enables you to specify a performance mode for the plugin.
NOTE: NUM_STREAMS and PERFORMANCE_HINT should not be used together.
To enable Performance Hints for your application, use the following command:
LATENCY
docker run --rm -d --device=/dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) \
-v ${PWD}/models/public/resnet-50-tf:/opt/model -p 9001:9001 openvino/model_server:latest-gpu \
--model_path /opt/model --model_name resnet --port 9001 \
--plugin_config "{\"PERFORMANCE_HINT\": \"LATENCY\"}" \
--target_device AUTO
THROUGHPUT
docker run --rm -d --device=/dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) \
-v ${PWD}/models/public/resnet-50-tf:/opt/model -p 9001:9001 openvino/model_server:latest-gpu \
--model_path /opt/model --model_name resnet --port 9001 \
--plugin_config "{\"PERFORMANCE_HINT\": \"THROUGHPUT\"}" \
--target_device AUTO
NOTE: currently, AUTO plugin cannot be used with
--shape auto
parameter while GPU device is enabled.
Using NVIDIA Plugin#
OpenVINO Model Server can be used also with NVIDIA GPU cards by using NVIDIA plugin from the github repo openvino_contrib. The docker image of OpenVINO Model Server including support for NVIDIA can be built from sources
git clone https://github.com/openvinotoolkit/model_server.git
cd model_server
make docker_build NVIDIA=1 OV_USE_BINARY=0
cd ..
Check also building from sources.
Example command to run container with NVIDIA support:
docker run -it --gpus all -p 9000:9000 -v ${PWD}/models/public/resnet-50-tf:/opt/model openvino/model_server:latest-cuda --model_path /opt/model --model_name resnet --port 9000 --target_device NVIDIA
For models with layers not supported on NVIDIA plugin, you can use a virtual plugin HETERO
which can use multiple devices listed after the colon:
docker run -it --gpus all -p 9000:9000 -v ${PWD}/models/public/resnet-50-tf:/opt/model openvino/model_server:latest-cuda --model_path /opt/model --model_name resnet --port 9000 --target_device HETERO:NVIDIA,CPU
Check the supported configuration parameters and supported layers
Using NPU device Plugin#
OpenVINO Model Server can support using NPU device
Docker image with required dependencies can be build using this procedure: The docker image of OpenVINO Model Server including support for NVIDIA can be built from sources
git clone https://github.com/openvinotoolkit/model_server.git
cd model_server
make release_image NPU=1
cd ..
Example command to run container with NPU:
docker run --device /dev/accel -p 9000:9000 --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -u $(id -u):$(id -g) \
-v ${PWD}/models/public/resnet-50-tf:/opt/model openvino/model_server:latest --model_path /opt/model --model_name resnet --port 9000 --target_device NPU
Check more info about the NPU driver for Linux.