OpenShift Operator

The Operator installs and manages development tools and production AI deployments in an OpenShift cluster. It enables easy deployment and management of AI inference services by creating ModelServer resource.

The Operator also integrates with the JupyterHub Spawner in Red Hat OpenShift Data Science and Open Data Hub. See detailed instructions below.

Install the Operator

In the OpenShift web console navigate to OperatorHub menu. Search for “OpenVINO” and select “OpenVINO™ Toolkit Operator”. Then, click the Install button.

Deploy the Operator

From the OpenShift Console

Once you have installed the Operator in OpenShift, you can manage it using the web console. Navigate to Installed Operators and click Create ModelServer or Create Notebook.

Create a ModelServer Resource

After selecting Create ModelServer you will see the template for creating a deployment configuration. A few parameters will be provided by default, including the following:

Parameter

Description

kind

set to ModelServer , which is the type of resource being deployed.

name

unique name of the deployment , which can be modified. model-server-sample is provided as an example.

namespace

this will default to the current namespace in your cluster

aws_access_key_id , aws_secret_access_key , aws_region

optional parameters should be configured only when using AWS S3 storage as your AI model repository.

batch_size

by default, batch size is derived from the model. Leave blank unless you wish to modify the default.

file_system_poll_wait_seconds

time interval in seconds between checking for new versions of AI models. Setting to 0 disables automatic version updates.

gcp_creds_secret_name

optional parameter should only be configured when using Google Cloud Storage as your AI model repository. The secret should be created as a GCP credentials JSON file.

grpc_port

required parameter defines the service port for the gRPC interface . 8080 is the default port, but it can be modified if needed.

https_proxy

optional parameter used only when a proxy is required to download models from a remote repository.

image_name

required parameter defines container registry for the OpenVINO Model Server image. By default openvino-model-server:latest is pulled, but latest can be replaced with a specific release version like 2021.3-gpu .

log_level

required parameter defines the log level. By default, the level is set to INFO with ERROR (errors only) and DEBUG (verbose) available as alternatives.

model_name

parameter used only when starting a ModelServer with a single AI model. The example provided uses resnet but this can be changed to describe your custom model. The parameters config_configmap_name and config_path are not used when this parameter is set.

model_path

parameter used only when starting ModelServer with a single AI model. For locally accessible storage, use models_path/model_name and for cloud storage s3://bucket/models/model or gs://bucket/models/model . For more information, see Cloud Storage Requirements . The parameters config_configmap_name and config_path are not used when this parameter is set.

model_version_policy

required parameter defines the version of AI models to serve. By default, the latest version is served. For other options, please see Model Version Policy documentation.

models_volume_claim

optional parameter should be defined only when using a persistent volume as your AI model repository. The Persistent Volume Claim (PVC) must be in the same namespace as this ModelServer resource.

plugin_config

parameter defines device plugin configuration for performance tuning. For automatic tuning, set to {"CPU_THROUGHPUT_STREAMS":"CPU_THROUGHPUT_AUTO"} .

replicas

this required parameter defines the number of replicas for this ModelServer deployment.

resources , cpu and memory

optional parameter defines compute resource limits for the node . Limit CPU cores and memory (e.g. 250Mi for 250MB).

rest_port

required parameter defines the service port for the REST interface . 8081 is the default port, but it can be modified if needed.

Adjust the parameters according to your needs. See the full list of parameters in the documentation for more details. See a screenshot of the template below:

template

From the OpenShift CLI

Alternatively, after installing the Operator, you may deploy and manage deployments by creating ModelServer resources using the oc OpenShift command line tool.

Modify the sample resource and run the following command:

oc apply -f config/samples/intel_v1alpha1_ovms.yaml

The available parameters are the same as above.

Note : Some deployment configurations have prerequisites like creating relevant resources in Kubernetes. For example, a secret with credentials, persistent volume claim or configmap with a Model Server configuration file.

Using ModelServer in an OpenShift Cluster

The Operator deploys a ModelServer instance as a Kubernetes service with a predefined number of replicas. The Service name will match the ModelServer resource. The suffix -ovms is added unless the phrase ovms is already included in the name.

oc get pods
NAME                           READY   STATUS    RESTARTS   AGE
ovms-sample-586f6f76df-dpps4   1/1     Running   0          8h

oc get services
NAME          TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
ovms-sample   ClusterIP   172.25.199.210   <none>        8080/TCP,8081/TCP   8h

The ModelServer service in OpenShift exposes gRPC and REST API endpoints for processing AI inference requests.

The readiness of models for serving can be confirmed by the READY field status in the oc get pods output. The endpoints can be also tested with a simple curl command with a request to REST API endpoints from any pod in the cluster:

curl http://<ovms_service_name>.<namespace>:8081/v1/config

or

curl http://<ovms_service_name>.<namespace>:8081/v1/models/<model_name>/metadata

In the example above above, assuming namespace called ovms, it would be:

curl http://ovms-sample.ovms:8081/v1/config
{
"resnet" :
{
 "model_version_status": [
  {
   "version": "1",
   "state": "AVAILABLE",
   "status": {
    "error_code": "OK",
    "error_message": "OK"
   }
  }
 ]
}

curl http://ovms-sample.ovms:8081/v1/models/resnet/metadata
{
 "modelSpec": {
  "name": "resnet",
  "signatureName": "",
  "version": "1"
 },
 "metadata": {
  "signature_def": {
   "@type": "type.googleapis.com/tensorflow.serving.SignatureDefMap",
   "signatureDef": {
    "serving_default": {
     "inputs": {
      "0": {
       "dtype": "DT_FLOAT",
       "tensorShape": {
        "dim": [
         {
          "size": "1",
          "name": ""
         },
         {
          "size": "3",
          "name": ""
         },
         {
          "size": "224",
          "name": ""
         },
         {
          "size": "224",
          "name": ""
         }
        ],
        "unknownRank": false
       },
       "name": "0"
      }
     },
     "outputs": {
      "1463": {
       "dtype": "DT_FLOAT",
       "tensorShape": {
        "dim": [
         {
          "size": "1",
          "name": ""
         },
         {
          "size": "1000",
          "name": ""
         }
        ],
        "unknownRank": false
       },
       "name": "1463"
      }
     },
     "methodName": ""
    }
   }
  }
 }
}

Using the AI Inference Endpoints to run predictions

There are a few different ways to use the AI inference endpoints created by the ModelServer resource, such as the following:

  • Deploy a client inside a pod in the cluster. A client inside the cluster can access the endpoints via the service name or the service cluster ip

  • Configure the service type as NodePort - this will expose the service on the Kubernetes node external IP address

  • In a managed cloud deployment use the service type LoadBalancer - this exposes the service as external IP address

  • Configure OpenShift route resource or ingress resource in opensource Kubernetes linked with the ModelServer service. In OpenShift, this operation could be done from the web console.

Check out the client code samples to see how your applications can generate gRPC or REST API calls to the AI inference endpoints.

The output below shows the image_classification.py example client connecting to a ModelServer resource serving a ResNet image classification model. The command below takes grpc_address set to the service name so it will work from the cluster pod. If the client is external to the OpenShift cluster, replace the address with the external DNS name or external IP and adjust the grpc_port parameter as needed.

$ python image_classification.py --grpc_port 8080 --grpc_address ovms-sample --input_name 0 --output_name 1463
Start processing:
    Model name: resnet
    Images list file: input_images.txt
images/airliner.jpeg (1, 3, 224, 224) ; data range: 0.0 : 255.0
Processing time: 25.56 ms; speed 39.13 fps
Detected: 404  Should be: 404
images/arctic-fox.jpeg (1, 3, 224, 224) ; data range: 0.0 : 255.0
Processing time: 20.95 ms; speed 47.72 fps
Detected: 279  Should be: 279
images/bee.jpeg (1, 3, 224, 224) ; data range: 0.0 : 255.0
Processing time: 21.90 ms; speed 45.67 fps
Detected: 309  Should be: 309
images/golden_retriever.jpeg (1, 3, 224, 224) ; data range: 0.0 : 255.0
Processing time: 21.84 ms; speed 45.78 fps
Detected: 207  Should be: 207
images/gorilla.jpeg (1, 3, 224, 224) ; data range: 0.0 : 255.0
Processing time: 20.26 ms; speed 49.36 fps
Detected: 366  Should be: 366
images/magnetic_compass.jpeg (1, 3, 224, 224) ; data range: 0.0 : 247.0
Processing time: 20.68 ms; speed 48.36 fps
Detected: 635  Should be: 635
images/peacock.jpeg (1, 3, 224, 224) ; data range: 0.0 : 255.0
Processing time: 21.57 ms; speed 46.37 fps
Detected: 84  Should be: 84
images/pelican.jpeg (1, 3, 224, 224) ; data range: 0.0 : 255.0
Processing time: 20.53 ms; speed 48.71 fps
Detected: 144  Should be: 144
images/snail.jpeg (1, 3, 224, 224) ; data range: 0.0 : 248.0
Processing time: 22.34 ms; speed 44.75 fps
Detected: 113  Should be: 113
images/zebra.jpeg (1, 3, 224, 224) ; data range: 0.0 : 255.0
Processing time: 21.27 ms; speed 47.00 fps
Detected: 340  Should be: 340
Overall accuracy= 100.0 %
Average latency= 21.1 ms

Integration with OpenShift Data Science and Open Data Hub

The Operator integrates with the JupyterHub Spawner in Red Hat OpenShift Data Science and Open Data Hub. Simply create a Notebook resource, which deploys an ImageStream containing the OpenVINO developer tools and ready-to-run Jupyter notebooks. To use the ImageStream, you must have already installed the Operator for OpenShift Data Science or Open Data Hub.

The Create Notebook button in the web console will build the container image and create an ImageStream. This enables selecting openvino-notebook image from the Jupyter Spawner drop-down menu. The image is maintained by Intel.

spawner