Model Server in Docker Containers¶
This is a step-by-step guide on how to deploy OpenVINO Model Server on Linux, using a Docker Container. Links are provided for different compatible hardware.
Before you start, make sure you have:
Docker Engine installed (How to Install Docker Engine)
Intel® Core™ processor (6-12th gen.) or Intel® Xeon® processor
(optional) AI accelerators supported by OpenVINO
Linux, macOS or Windows via WSL
NOTE: accelerators are only tested on bare-metal Linux hosts.
Starting with a container¶
Pull OpenVINO Model Server Image.
Start a Docker Container with OVMS and your chosen model from cloud storage.
Provide the input files, (arrange an input Dataset).
Prepare a client package.
Run the prediction using ovmsclient.
Here is an example of this process using a ResNet50 model for image classification:
Pull an image from Docker or RedHat Ecosystem Catalog
docker pull openvino/model_server:latest
or, alternatively
docker pull registry.connect.redhat.com/intel/openvino-model-server:latest
Start the container
# start the container
docker run -p 9000:9000 openvino/model_server:latest \
--model_name resnet --model_path gs://ovms-public-eu/resnet50-binary \
--layout NHWC:NCHW --port 9000
# download input files, an image, and a label mapping file
wget https://raw.githubusercontent.com/openvinotoolkit/model_server/releases/2022/1/demos/common/static/images/zebra.jpeg
wget https://raw.githubusercontent.com/openvinotoolkit/model_server/releases/2022/1/demos/common/python/classes.py
# Install the Python-based ovmsclient package
pip3 install ovmsclient
Run prediction
echo 'import numpy as np
from classes import imagenet_classes
from ovmsclient import make_grpc_client
client = make_grpc_client("localhost:9000")
with open("zebra.jpeg", "rb") as f:
img = f.read()
output = client.predict({"0": img}, "resnet")
result_index = np.argmax(output[0])
print(imagenet_classes[result_index])' >> predict.py
python predict.py
zebra
To learn how to set up OpenVINO Model Server, refer to the Quick Start guide.
Building an OpenVINO Model Server Docker Image¶
You can build your own Docker image executing the make docker_build
command in the git repository root folder. In the ./dist
directory it will generate:
image tagged as openvino/model_server:latest - with CPU, NCS, and HDDL support
image tagged as openvino/model_server:latest-gpu - with CPU, NCS, HDDL, and iGPU support
image tagged as openvino/model_server:latest-nginx-mtls - with CPU, NCS, and HDDL support and a reference nginx setup of mTLS integration
release package (.tar.gz, with ovms binary and necessary libraries)
Note: OVMS docker image can be created with ubi8-minimal base image or the default ubuntu20. Note that OVMS with the ubi base image doesn’t support NCS and HDDL accelerators.
To do so, use either of these commands:
Running the inference operation on GPU requires the ovms process security context account to have correct permissions. It has to belong to the render group identified by the command:
stat -c "group_name=%G group_id=%g" /dev/dri/render\*
The default account in the docker image is already preconfigured. In case you change the security context, use the following command to start the ovms container:
docker run --rm -it --device=/dev/dri --group-add=$(stat -c "%g" /dev/dri/render\* | head -n 1) -u $(id -u):$(id -g) \
-v /opt/model:/opt/model -p 9001:9001 openvino/model_server:latest-gpu \
--model_path /opt/model --model_name my_model --port 9001 --target_device GPU
Note: The public docker image includes the OpenCL drivers for GPU in version 21.38.21026.
Model Server image with DG2 support (Ubuntu 20.04)¶
Image with DG2 GPU support has not been published. To build the image yourself you need to have DG2 drivers installed on the host and NEO Runtime packages available.
Put NEO Runtime packages in the catalog <model_server_dir>/release_files/drivers/dg2
and run make docker_build
with parameter: INSTALL_DRIVER_VERSION=dg2
.
Example:
make docker_build BASE_OS=ubuntu OVMS_CPP_DOCKER_IMAGE=ovms_dg2 INSTALL_DRIVER_VERSION=dg2
Using Multi-Device Plugin¶
If you have multiple inference devices available (e.g. Myriad VPUs and CPU) you can increase inference throughput by enabling the Multi-Device Plugin. With Multi-Device Plugin enabled, inference requests will be load balanced between multiple devices. For more detailed information read OpenVino’s Multi-Device plugin documentation.
In order to use this feature in OpenVino™ Model Server, following steps are required:
Set target_device for the model in configuration json file to MULTI:DEVICE_1,DEVICE_2 (e.g. MULTI:MYRIAD,CPU, order of the devices defines their priority, so MYRIAD devices will be used first in this example)
Below is exemplary config.json setting up Multi-Device Plugin for resnet model, using Intel® Movidius™ Neural Compute Stick and CPU devices:
make docker_build BASE_OS=ubuntu
Additionally, you can use the INSTALL_DRIVER_VERSION
argument command to choose which GPU driver version is used by the produced image. If not provided, most recent version is used.
Currently, the following versions are available:
21.38.21026 - Redhat
21.48.21782 - Ubuntu
Example:
make docker_build INSTALL_DRIVER_VERSION=21.38.21026
If not provided, version 21.38.21026 is used for Redhat and 21.48.21782 is used for Ubuntu.