Metrics

Introduction

This document describes how to use metrics endpoint in the OpenVINO Model Server. They can be applied for:

  • Providing performance and utilization statistics for monitoring and benchmarking purposes

  • Auto scaling of the model server instances in Kubernetes and OpenShift based on application related metrics

Note

Currently, metrics feature is released as a preview feature.

Built-in metrics allow tracking the performance without any extra logic on the client side or using network traffic monitoring tools like load balancers or reverse-proxies.

It also exposes metrics which are not related to the network traffic.

For example, statistics of the inference execution queue, model runtime parameters etc. They can also track the usage based on model version, API type or requested endpoint methods.

OpenVINO Model Server metrics are compatible with Prometheus standard

They are exposed on the /metrics endpoint.

Available metrics families

Metrics from default list are enabled with the metrics_enabled flag or json configuration.

However, you can enable also additional metrics by listing all the metrics you want to enable in the metric_list flag or json configuration.

Default metrics

Type

Name

Labels

Description

gauge

ovms_streams

name,version

Number of OpenVINO execution streams

gauge

ovms_current_requests

name,version

Number of inference requests currently in process

counter

ovms_requests_success

api,interface,method,name,version

Number of successful requests to a model or a DAG.

counter

ovms_requests_fail

api,interface,method,name,version

Number of failed requests to a model or a DAG.

histogram

ovms_request_time_us

interface,name,version

Processing time of requests to a model or a DAG.

histogram

ovms_inference_time_us

name,version

Inference execution time in the OpenVINO backend.

histogram

ovms_wait_for_infer_req_time_us

name,version

Request waiting time in the scheduling queue.

Optional metrics

Type

Name

Labels

Description

gauge

ovms_infer_req_queue_size

name,version

Inference request queue size (nireq).

gauge

ovms_infer_req_active

name,version

Number of currently consumed inference request from the processing queue.

Labels description

Name

Values

Description

api

KServe, TensorFlowServing

Name of the serving API.

interface

REST, gRPC

Name of the serving interface.

method

ModelMetadata, ModelReady, ModelInfer, Predict, GetModelStatus, GetModelMetadata

Interface methods.

version

1, 2, …, n

Model version. Note that GetModelStatus and ModelReady do not have the version label.

name

As defined in model server config

Model name or DAG name.

Enable metrics

By default, the metrics feature is disabled.

Metrics endpoint is using the same port as the REST interface for running the model queries.

It is required to enable REST in the model server by setting the parameter rest_port.

To enable default metrics set you need to specify the metrics_enabled flag or json setting:

CLI

docker run --rm -d -p 9000:9000 -p 8000:8000 openvino/model_server:latest \
       --model_name resnet --model_path gs://ovms-public-eu/resnet50  --port 9000 \
       --rest_port 8000 \
       --metrics_enabled

CONFIG CMD

docker run --rm -d -v -d -v ${PWD}/workspace:/workspace openvino/model_server --config_path /workspace/config.json -p 8000:8000 -p 9000:9000 openvino/model_server:latest \
       --rest_port 8000
       --port 9000

CONFIG JSON

{
 "model_config_list": [
     {
        "config": {
             "name": "resnet",
             "base_path": "/workspace/resnet-50-tf",
             "layout": "NHWC:NCHW",
             "shape": "(1,224,224,3)"
        }
     }
 ],
 "monitoring":
     {
         "metrics":
         {
             "enable" : true
         }
     }
}

Change the default list of metrics

You can enable from one up to all the metrics available at once.

To enable specific set of metrics you need to specify the metrics_list flag or json setting:

CLI

docker run --rm -d -p 9000:9000 -p 8000:8000 openvino/model_server:latest \
      --model_name resnet --model_path gs://ovms-public-eu/resnet50  --port 9000 \
      --rest_port 8000 \
      --metrics_enabled \
      --metrics_list ovms_requests_success,ovms_infer_req_queue_size

CONFIG CMD

docker run --rm -d -v -d -v ${PWD}/workspace:/workspace openvino/model_server \
   --config_path /workspace/config.json -p 9000:9000 -p 8000:8000 openvino/model_server:latest \
   --rest_port 8000 \
   --port 9000

CONFIG JSON

{
 "model_config_list": [
     {
        "config": {
             "name": "resnet",
             "base_path": "gs://ovms-public-eu/resnet50"
        }
     }
 ],
 "monitoring":
     {
         "metrics":
         {
             "enable" : true,
             "metrics_list": ["ovms_requests_success", "ovms_infer_req_queue_size"]
         }
     }
 }

CONFIG JSON WITH ALL METRICS ENABLED

{
 "model_config_list": [
     {
        "config": {
             "name": "resnet",
             "base_path": "/workspace/resnet-50-tf"
        }
     }
 ],
 "monitoring":
     {
         "metrics":
         {
             "enable" : true,
             "metrics_list":
                 [ "ovms_requests_success",
                 "ovms_requests_fail",
                 "ovms_inference_time_us",
                 "ovms_wait_for_infer_req_time_us",
                 "ovms_request_time_us",
                 "ovms_current_requests",
                 "ovms_infer_req_active",
                 "ovms_streams",
                 "ovms_infer_req_queue_size"]
         }
     }
 }

Example response from metrics endpoint

To use data from metrics endpoint you can use the curl command:

curl http://localhost:8000/metrics

Example metrics output

Metrics implementation for DAG pipelines

For DAG pipeline execution there are relevant 3 metrics listed below. They track the execution of the whole pipeline, gathering information from all pipeline nodes.

DAG metrics

Type

Name

Description

counter

ovms_requests_success

Number of successful requests to a model or a DAG.

counter

ovms_requests_fail

Number of failed requests to a model or a DAG.

histogram

ovms_request_time_us

Processing time of requests to a model or a DAG.

The remaining metrics track the execution for the individual models in the pipeline separately. It means that each request to the DAG pipeline will update also the metrics for all individual models used as the execution nodes.