OpenShift Operator¶
The Operator installs and manages development tools and production AI deployments in an OpenShift cluster. It enables easy deployment and management of AI inference services by creating ModelServer
resource.
The Operator also integrates with the JupyterHub Spawner in Red Hat OpenShift Data Science and Open Data Hub. See detailed instructions below.
Install the Operator¶
In the OpenShift web console navigate to OperatorHub menu. Search for “OpenVINO” and select “OpenVINO™ Toolkit Operator”. Then, click the Install
button.
Deploy the Operator¶
From the OpenShift Console¶
Once you have installed the Operator in OpenShift, you can manage it using the web console. Navigate to Installed Operators
and click Create ModelServer
or Create Notebook
.
Create a ModelServer Resource¶
After selecting Create ModelServer
you will see the template for creating a deployment configuration. A few parameters will be provided by default, including the following:
Parameter |
Description |
---|---|
|
set to |
|
unique name of the deployment , which can be modified. |
|
this will default to the current namespace in your cluster |
|
optional parameters should be configured only when using AWS S3 storage as your AI model repository. |
|
by default, batch size is derived from the model. Leave blank unless you wish to modify the default. |
|
time interval in seconds between checking for new versions of AI models. Setting to |
|
optional parameter should only be configured when using Google Cloud Storage as your AI model repository. The secret should be created as a GCP credentials JSON file. |
|
required parameter defines the service port for the gRPC interface . |
|
optional parameter used only when a proxy is required to download models from a remote repository. |
|
required parameter defines container registry for the OpenVINO Model Server image. By default |
|
required parameter defines the log level. By default, the level is set to |
|
parameter used only when starting a |
|
parameter used only when starting |
|
required parameter defines the version of AI models to serve. By default, the latest version is served. For other options, please see Model Version Policy documentation. |
|
optional parameter should be defined only when using a persistent volume as your AI model repository. The Persistent Volume Claim (PVC) must be in the same namespace as this |
|
parameter defines device plugin configuration for performance tuning. For automatic tuning, set to |
|
this required parameter defines the number of replicas for this |
|
optional parameter defines compute resource limits for the node . Limit CPU cores and memory (e.g. |
|
required parameter defines the service port for the REST interface . |
Adjust the parameters according to your needs. See the full list of parameters in the documentation for more details. See a screenshot of the template below:

From the OpenShift CLI¶
Alternatively, after installing the Operator, you may deploy and manage deployments by creating ModelServer
resources using the oc
OpenShift command line tool.
Modify the sample resource and run the following command:
oc apply -f config/samples/intel_v1alpha1_ovms.yaml
The available parameters are the same as above.
Note : Some deployment configurations have prerequisites like creating relevant resources in Kubernetes. For example, a secret with credentials, persistent volume claim or configmap with a Model Server configuration file.
Using ModelServer in an OpenShift Cluster¶
The Operator deploys a ModelServer
instance as a Kubernetes service with a predefined number of replicas. The Service
name will match the ModelServer
resource. The suffix -ovms
is added unless the phrase ovms
is already included in the name.
oc get pods
NAME READY STATUS RESTARTS AGE
ovms-sample-586f6f76df-dpps4 1/1 Running 0 8h
oc get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
ovms-sample ClusterIP 172.25.199.210 <none> 8080/TCP,8081/TCP 8h
The ModelServer
service in OpenShift exposes gRPC and REST API endpoints for processing AI inference requests.
The readiness of models for serving can be confirmed by the READY field status in the oc get pods
output. The endpoints can be also tested with a simple curl
command with a request to REST API endpoints from any pod in the cluster:
curl http://<ovms_service_name>.<namespace>:8081/v1/config
or
curl http://<ovms_service_name>.<namespace>:8081/v1/models/<model_name>/metadata
In the example above above, assuming namespace called ovms
, it would be:
curl http://ovms-sample.ovms:8081/v1/config
{
"resnet" :
{
"model_version_status": [
{
"version": "1",
"state": "AVAILABLE",
"status": {
"error_code": "OK",
"error_message": "OK"
}
}
]
}
curl http://ovms-sample.ovms:8081/v1/models/resnet/metadata
{
"modelSpec": {
"name": "resnet",
"signatureName": "",
"version": "1"
},
"metadata": {
"signature_def": {
"@type": "type.googleapis.com/tensorflow.serving.SignatureDefMap",
"signatureDef": {
"serving_default": {
"inputs": {
"0": {
"dtype": "DT_FLOAT",
"tensorShape": {
"dim": [
{
"size": "1",
"name": ""
},
{
"size": "3",
"name": ""
},
{
"size": "224",
"name": ""
},
{
"size": "224",
"name": ""
}
],
"unknownRank": false
},
"name": "0"
}
},
"outputs": {
"1463": {
"dtype": "DT_FLOAT",
"tensorShape": {
"dim": [
{
"size": "1",
"name": ""
},
{
"size": "1000",
"name": ""
}
],
"unknownRank": false
},
"name": "1463"
}
},
"methodName": ""
}
}
}
}
}
Using the AI Inference Endpoints to run predictions¶
There are a few different ways to use the AI inference endpoints created by the ModelServer
resource, such as the following:
Deploy a client inside a
pod
in the cluster. A client inside the cluster can access the endpoints via the service name or the service cluster ipConfigure the service type as
NodePort
- this will expose the service on the Kubernetesnode
external IP addressIn a managed cloud deployment use the service type
LoadBalancer
- this exposes the service as external IP addressConfigure OpenShift
route
resource oringress
resource in opensource Kubernetes linked with the ModelServer service. In OpenShift, this operation could be done from the web console.
Check out the client code samples to see how your applications can generate gRPC or REST API calls to the AI inference endpoints.
The output below shows the image_classification.py example client connecting to a ModelServer
resource serving a ResNet image classification model. The command below takes grpc_address set to the service name so it will work from the cluster pod. If the client is external to the OpenShift cluster, replace the address with the external DNS name or external IP and adjust the grpc_port parameter as needed.
$ python image_classification.py --grpc_port 8080 --grpc_address ovms-sample --input_name 0 --output_name 1463
Start processing:
Model name: resnet
Images list file: input_images.txt
images/airliner.jpeg (1, 3, 224, 224) ; data range: 0.0 : 255.0
Processing time: 25.56 ms; speed 39.13 fps
Detected: 404 Should be: 404
images/arctic-fox.jpeg (1, 3, 224, 224) ; data range: 0.0 : 255.0
Processing time: 20.95 ms; speed 47.72 fps
Detected: 279 Should be: 279
images/bee.jpeg (1, 3, 224, 224) ; data range: 0.0 : 255.0
Processing time: 21.90 ms; speed 45.67 fps
Detected: 309 Should be: 309
images/golden_retriever.jpeg (1, 3, 224, 224) ; data range: 0.0 : 255.0
Processing time: 21.84 ms; speed 45.78 fps
Detected: 207 Should be: 207
images/gorilla.jpeg (1, 3, 224, 224) ; data range: 0.0 : 255.0
Processing time: 20.26 ms; speed 49.36 fps
Detected: 366 Should be: 366
images/magnetic_compass.jpeg (1, 3, 224, 224) ; data range: 0.0 : 247.0
Processing time: 20.68 ms; speed 48.36 fps
Detected: 635 Should be: 635
images/peacock.jpeg (1, 3, 224, 224) ; data range: 0.0 : 255.0
Processing time: 21.57 ms; speed 46.37 fps
Detected: 84 Should be: 84
images/pelican.jpeg (1, 3, 224, 224) ; data range: 0.0 : 255.0
Processing time: 20.53 ms; speed 48.71 fps
Detected: 144 Should be: 144
images/snail.jpeg (1, 3, 224, 224) ; data range: 0.0 : 248.0
Processing time: 22.34 ms; speed 44.75 fps
Detected: 113 Should be: 113
images/zebra.jpeg (1, 3, 224, 224) ; data range: 0.0 : 255.0
Processing time: 21.27 ms; speed 47.00 fps
Detected: 340 Should be: 340
Overall accuracy= 100.0 %
Average latency= 21.1 ms
Integration with OpenShift Data Science and Open Data Hub¶
The Operator integrates with the JupyterHub Spawner in Red Hat OpenShift Data Science and Open Data Hub. Simply create a Notebook
resource, which deploys an ImageStream containing the OpenVINO developer tools and ready-to-run Jupyter notebooks. To use the ImageStream, you must have already installed the Operator for OpenShift Data Science or Open Data Hub.
The Create Notebook
button in the web console will build the container image and create an ImageStream. This enables selecting openvino-notebook
image from the Jupyter Spawner drop-down menu. The image is maintained by Intel.
