Hello Model Server

This tutorial is also available as a Jupyter notebook that can be cloned directly from GitHub. See the installation guide for instructions to run this tutorial locally on Windows, Linux or macOS.


Introduction to OpenVINO™ Model Server (OVMS).

What is Model Serving?

A model server hosts models and makes them accessible to software components over standard network protocols. A client sends a request to the model server, which performs inference and sends a response back to the client. Model serving offers many advantages for efficient model deployment:

  • Remote inference enables using lightweight clients with only the necessary functions to perform API calls to edge or cloud deployments.

  • Applications are independent of the model framework, hardware device, and infrastructure.

  • Client applications in any programming language that supports REST or gRPC calls can be used to run inference remotely on the model server.

  • Clients require fewer updates since client libraries change very rarely.

  • Model topology and weights are not exposed directly to client applications, making it easier to control access to the model.

  • Ideal architecture for microservices-based applications and deployments in cloud environments – including Kubernetes and OpenShift clusters.

  • Efficient resource utilization with horizontal and vertical inference scaling.



Serving with OpenVINO Model Server

OpenVINO Model Server (OVMS) is a high-performance system for serving models. Implemented in C++ for scalability and optimized for deployment on Intel architectures, the model server uses the same architecture and API as TensorFlow Serving and KServe while applying OpenVINO for inference execution. Inference service is provided via gRPC or REST API, making deploying new algorithms and AI experiments easy.



To quickly start using OpenVINO™ Model Server, follow these steps:

Step 1: Prepare Docker

Install Docker Engine, including its post-installation steps, on your development system. To verify installation, test it, using the following command. When it is ready, it will display a test image and a message.

!docker run hello-world
Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:

For more examples and ideas, visit:

Step 2: Preparing a Model Repository

The models need to be placed and mounted in a particular directory structure and according to the following rules:

tree models/
├── model1
│   ├── 1
│   │   ├── ir_model.bin
│   │   └── ir_model.xml
│   └── 2
│       ├── ir_model.bin
│       └── ir_model.xml
├── model2
│   └── 1
│       ├── ir_model.bin
│       ├── ir_model.xml
│       └── mapping_config.json
├── model3
│    └── 1
│        └── model.onnx
├── model4
│      └── 1
│        ├── model.pdiparams
│        └── model.pdmodel
└── model5
       └── 1
         └── TF_fronzen_model.pb
  • Each model should be stored in a dedicated directory, for example, model1 and model2.

  • Each model directory should include a sub-folder for each of its versions (1,2, etc). The versions and their folder names should be positive integer values.

  • Note that in execution, the versions are enabled according to a pre-defined version policy. If the client does not specify the version number in parameters, by default, the latest version is served.

  • Every version folder must include model files, that is, .bin and .xml for OpenVINO IR, .onnx for ONNX, .pdiparams and .pdmodel for Paddle Paddle, and .pb for TensorFlow. The file name can be arbitrary.

import os
import shutil

dedicated_dir = "models"
model_name = "detection"
model_version = "1"

MODEL_DIR = f"{dedicated_dir}/{model_name}/{model_version}"
XML_PATH = "../004-hello-detection/model/horizontal-text-detection-0001.xml"
BIN_PATH = "../004-hello-detection/model/horizontal-text-detection-0001.bin"
os.makedirs(MODEL_DIR, exist_ok=True)
shutil.copy(XML_PATH, MODEL_DIR)
shutil.copy(BIN_PATH, MODEL_DIR)
print(f"Model Copied to \"./{MODEL_DIR}\".")
Model Copied to "./models/detection/1".

Step 3: Start the Model Server Container

Pull and start the container:

!docker run -d --rm --name="ovms" -v $(pwd)/models:/models -p 9000:9000 openvino/model_server:latest --model_path /models/detection/ --model_name detection --port 9000

Check whether the OVMS container is running normally:

!docker ps | grep ovms
7bf50596c18d   openvino/model_server:latest   "/ovms/bin/ovms --mo…"   Less than a second ago   Up Less than a second>9000/tcp, :::9000->9000/tcp   ovms

The required Model Server parameters are listed below. For additional configuration options, see the Model Server Parameters section.


remove the container when exiting the Docker container


runs the container in the background


defines how to mount the model folder in the Docker container


exposes the model serving port outside the Docker container


represents the image name; the ovms binary is the Docker entry point

varies by tag and build process - see tags: https://hub.docker.com/r/openvino/model_server/tags/ for a full tag list.


model location, which can be:

a Docker container path that is mounted during start-up

a Google Cloud Storage path gs://<bucket>/<model_path>

an AWS S3 path s3://<bucket>/<model_path>

an Azure blob path az://<container>/<model_path>


the name of the model in the model_path


the gRPC server port


the REST server port

If the serving port 9000 is already in use, please switch it to another available port on your system. For example:-p 9020:9000

Step 4: Prepare the Example Client Components

OpenVINO Model Server exposes two sets of APIs: one compatible with TensorFlow Serving and another one, with KServe API, for inference. Both APIs work on gRPC and RESTinterfaces. Supporting two sets of APIs makes OpenVINO Model Server easier to plug into existing systems the already leverage one of these APIs for inference. This example will demonstrate how to write a TensorFlow Serving API client for object detection.


Install necessary packages.

!pip install -q ovmsclient
Collecting ovmsclient
  Downloading ovmsclient-2022.3-py3-none-any.whl (163 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 164.0/164.0 KB 2.1 MB/s eta 0:00:00a 0:00:01
Requirement already satisfied: numpy>=1.16.6 in /home/adrian/repos/openvino_notebooks_adrian/venv/lib/python3.9/site-packages (from ovmsclient) (1.23.4)
Requirement already satisfied: requests>=2.27.1 in /home/adrian/repos/openvino_notebooks_adrian/venv/lib/python3.9/site-packages (from ovmsclient) (2.27.1)
Collecting grpcio>=1.47.0
  Downloading grpcio-1.51.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.8/4.8 MB 5.6 MB/s eta 0:00:0000:0100:01
Requirement already satisfied: protobuf>=3.19.4 in /home/adrian/repos/openvino_notebooks_adrian/venv/lib/python3.9/site-packages (from ovmsclient) (3.19.6)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/adrian/repos/openvino_notebooks_adrian/venv/lib/python3.9/site-packages (from requests>=2.27.1->ovmsclient) (1.26.9)
Requirement already satisfied: idna<4,>=2.5 in /home/adrian/repos/openvino_notebooks_adrian/venv/lib/python3.9/site-packages (from requests>=2.27.1->ovmsclient) (3.3)
Requirement already satisfied: certifi>=2017.4.17 in /home/adrian/repos/openvino_notebooks_adrian/venv/lib/python3.9/site-packages (from requests>=2.27.1->ovmsclient) (2021.10.8)
Requirement already satisfied: charset-normalizer~=2.0.0 in /home/adrian/repos/openvino_notebooks_adrian/venv/lib/python3.9/site-packages (from requests>=2.27.1->ovmsclient) (2.0.12)
Installing collected packages: grpcio, ovmsclient
  Attempting uninstall: grpcio
    Found existing installation: grpcio 1.34.1
    Uninstalling grpcio-1.34.1:
      Successfully uninstalled grpcio-1.34.1
Successfully installed grpcio-1.51.3 ovmsclient-2022.3
WARNING: You are using pip version 22.0.4; however, version 23.0.1 is available.
You should consider upgrading via the '/home/adrian/repos/openvino_notebooks_adrian/venv/bin/python -m pip install --upgrade pip' command.


import cv2
import numpy as np
import matplotlib.pyplot as plt
from ovmsclient import make_grpc_client

Request Model Status

address = "localhost:9000"

# Bind the grpc address to the client object
client = make_grpc_client(address)
model_status = client.get_model_status(model_name=model_name)
{1: {'state': 'AVAILABLE', 'error_code': 0, 'error_message': 'OK'}}

Request Model Metadata

model_metadata = client.get_model_metadata(model_name=model_name)
{'model_version': 1, 'inputs': {'image': {'shape': [1, 3, 704, 704], 'dtype': 'DT_FLOAT'}}, 'outputs': {'1469_1470.0': {'shape': [-1], 'dtype': 'DT_FLOAT'}, '1078_1079.0': {'shape': [1000], 'dtype': 'DT_FLOAT'}, '1330_1331.0': {'shape': [36], 'dtype': 'DT_FLOAT'}, 'labels': {'shape': [-1], 'dtype': 'DT_INT32'}, '1267_1268.0': {'shape': [121], 'dtype': 'DT_FLOAT'}, '1141_1142.0': {'shape': [1000], 'dtype': 'DT_FLOAT'}, '1204_1205.0': {'shape': [484], 'dtype': 'DT_FLOAT'}, 'boxes': {'shape': [-1, 5], 'dtype': 'DT_FLOAT'}}}

Load input image

# Text detection models expect an image in BGR format.
image = cv2.imread("../data/image/intel_rnb.jpg")
fp_image = image.astype("float32")

# Resize the image to meet network expected input sizes.
input_shape = model_metadata['inputs']['image']['shape']
height, width = input_shape[2], input_shape[3]
resized_image = cv2.resize(fp_image, (height, width))

# Reshape to the network input shape.
input_image = np.expand_dims(resized_image.transpose(2, 0, 1), 0)
plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
<matplotlib.image.AxesImage at 0x7fee22d6ecd0>

Request Prediction on a Numpy Array

inputs = {"image": input_image}

# Run inference on model server and receive the result data
boxes = client.predict(inputs=inputs, model_name=model_name)['boxes']

# Remove zero only boxes.
boxes = boxes[~np.all(boxes == 0, axis=1)]
[[3.9992419e+02 8.1032524e+01 5.6187299e+02 1.3619952e+02 5.3706491e-01]
 [2.6189725e+02 6.8310547e+01 3.8541251e+02 1.2095630e+02 4.7559953e-01]
 [6.1644586e+02 2.8008759e+02 6.6627545e+02 3.1178854e+02 4.4982004e-01]
 [2.0762042e+02 6.2798470e+01 2.3444728e+02 1.0706525e+02 3.7216505e-01]
 [5.1742780e+02 5.5603595e+02 5.4927539e+02 5.8736023e+02 3.2588077e-01]
 [2.2261986e+01 4.5406548e+01 1.8868817e+02 1.0225631e+02 3.0407205e-01]]


# For each detection, the description is in the [x_min, y_min, x_max, y_max, conf] format:
# The image passed here is in BGR format with changed width and height. To display it in colors expected by matplotlib, use cvtColor function
def convert_result_to_image(bgr_image, resized_image, boxes, threshold=0.3, conf_labels=True):
    # Define colors for boxes and descriptions.
    colors = {"red": (255, 0, 0), "green": (0, 255, 0)}

    # Fetch the image shapes to calculate a ratio.
    (real_y, real_x), (resized_y, resized_x) = bgr_image.shape[:2], resized_image.shape[:2]
    ratio_x, ratio_y = real_x / resized_x, real_y / resized_y

    # Convert the base image from BGR to RGB format.
    rgb_image = cv2.cvtColor(bgr_image, cv2.COLOR_BGR2RGB)

    # Iterate through non-zero boxes.
    for box in boxes:
        # Pick a confidence factor from the last place in an array.
        conf = box[-1]
        if conf > threshold:
            # Convert float to int and multiply corner position of each box by x and y ratio.
            # If the bounding box is found at the top of the image,
            # position the upper box bar little lower to make it visible on the image.
            (x_min, y_min, x_max, y_max) = [
                int(max(corner_position * ratio_y, 10)) if idx % 2
                else int(corner_position * ratio_x)
                for idx, corner_position in enumerate(box[:-1])

            # Draw a box based on the position, parameters in rectangle function are: image, start_point, end_point, color, thickness.
            rgb_image = cv2.rectangle(rgb_image, (x_min, y_min), (x_max, y_max), colors["green"], 3)

            # Add text to the image based on position and confidence.
            # Parameters in text function are: image, text, bottom-left_corner_textfield, font, font_scale, color, thickness, line_type.
            if conf_labels:
                rgb_image = cv2.putText(
                    (x_min, y_min - 10),

    return rgb_image
plt.figure(figsize=(10, 6))
plt.imshow(convert_result_to_image(image, resized_image, boxes, conf_labels=False))
<matplotlib.image.AxesImage at 0x7fee219e4df0>

To stop and remove the model server container, you can use the following command:

!docker stop ovms