OpenVINO Model Server C-API¶
This document describes OpenVINO Model Server (OVMS) C API that allows OpenVINO Model Server to be linked into C/C++ applications. With exceptions listed at the end of this document, all capabilities of OpenVINO Model Server are included in the shared library.
With OpenVINO Model Server 2023.1 release C-API is no longer in preview state and is now public. This version contains few breaking changes. Following function names changed -
*Get* was removed from their name:
Server functionalities are encapsulated in shared library built from OpenVINO Model Server source. To include OpenVINO Model Server you need to link this library with your application and use C API defined in header file.
Calling a method to start the model serving in your application initiates the OpenVINO Model Server as a separate thread. Then you can schedule inference both directly from app using C API and gRPC/HTTP endpoints.
API is versioned according to SemVer 2.0. Calling
OVMS_ApiVersion it is possible to get
minor version number.
major - incremented when new, backward incompatible changes are introduced to the API itself (API call removal, name change, parameter change)
minor - incremented when API is modified but backward compatible (new API call added)
There is no patch version number. Underlying functionality changes not related to API itself are tracked via OpenVINO Model Server version. OpenVINO Model Server and OpenVINO versions can be tracked via logs or
ServerMetadata request (via KServe API).
Server configuration and start¶
To start OpenVINO Model Server you need to create
OVMS_Server object using
OVMS_ServerNew, with set of
OVMS_ModelsSettings that describe how the server should be configured. Once the server is started using
OVMS_ServerStartFromConfigurationFile you can schedule the inferences using
OVMS_Inference. To stop server, you must call
OVMS_ServerDelete. While the server is alive you can schedule both in process inferences as well as use gRPC API to schedule inferences from remote machine. Optionally you can also enable HTTP service. One can also query metadata using
OVMS_ServerMetadata. Example how to use OpenVINO Model Server with C/C++ application is here.
Most of OpenVINO Model Server C API functions return
OVMS_Status object pointer indicating the success or failure. Success is indicated by nullptr (NULL). Failure is indicated by returning
OVMS_Status object. The status code can be extracted using
OVMS_StatusCode function and the details of error can be retrieved using
The ownership of
OVMS_Status is passed to the caller of the function. You must delete the object using
To execute inference using C API you must follow steps described below.
Prepare inference request¶
Create an inference request using
OVMS_InferenceRequestNew specifying which servable name and optionally version to use. Then specify input tensors with
OVMS_InferenceRequestAddInput and set the tensor data using
Execute inference with OpenVINO Model Server using
OVMS_Inference synchronous call. During inference execution you must not modify
OVMS_InferenceRequest and bound memory buffers.
Process inference response¶
If the inference was successful, you receive
OVMS_InferenceRequest object. After processing the response, you must free the response memory by calling
To process response, first you must check for inference error. If no error occurred, you must iterate over response outputs and parameters using
OVMS_InferenceResponseParameterCount. Then you must extract details describing each output and parameter using
OVMS_InferenceResponseParameter. Example how to use OpenVINO Model Server with C/C++ application is here. While in example app you have only single thread scheduling inference request you can execute multiple inferences simultaneously using different threads.
Note: After inference execution is finished you can reuse the same
OVMS_InferenceRequest by using
OVMS_InferenceRequestInputRemoveData and then setting different tensor data with
Server liveness and readiness¶
To check if OpenVINO Model Server is alive and will respond to requests you can use
OVMS_ServerLive. Note that live status doesn’t guarantee the model readiness. Check the readiness with `OVMS_ServerReady’ call to show if initial configuration loading has finished including loading all correctly configured models.
To check if servable is ready for inference and metadata requests use
OVMS_GetServableState specifying name and optionally version.
OVMS_GetServableMetadata call to get information about servable inputs, outputs. If the request was successful you receive
OVMS_ServableMetadata object. To get information about every input/output you must use first check for number of inputs/outputs with
OVMS_ServableMetadataOutputCount, and then use
OVMS_ServableMetadataOuput calls to extract details about each input/output. After retrieving required data you must release response object with
To check server metadata use
OVMS_ServerMetadata call. It will create new object of type
OVMS_Metadata that you need to later release with
OVMS_StringFree. It will contain information about version of OpenVINO and version of Model Server. To serialize
OVMS_ServerMetadata to string JSON you can use
OVMS_SerializeMetadataToString function. This allocates char table that needs to be released later as well with
Launching server in single model mode is not supported. You must use configuration file.
There is no direct support for jpeg/png encoded input format through C API.
There is no metrics endpoint exposed through C API.
Inference scheduled through C API does not have metrics
You cannot turn gRPC endpoint off, REST API endpoint is optional.
There is no API for asynchronous inference.
There is no support for stateful models.
There is no support for mediapipe graphs.