KServe compatible gRPC API#

Introduction#

This document gives information about OpenVINO™ Model Server gRPC API compatible with KServe. It is documented in KServe repository. Using the gRPC interface is recommended for optimal performance due to its faster implementation of input data deserialization. gRPC achieves lower latency, especially with larger input messages like images.

The API includes following endpoints:

Server Live API
Server Ready API
Server Metadata API
Model Ready API
Model Metadata API
Inference API
Streaming Inference API

NOTE: Examples of using each of above endpoints can be found in KServe samples.

Server Live API#

Gets information about server liveness. Server is alive when communication channel can be established successfully.

Check KServe documentation for more details.

Server Ready API#

Gets information about server readiness. Server is ready when initial configuration has been loaded. Server gets into ready state only once and remains in that state for the rest of its lifetime regardless the outcome of the initial loading phase. If some of the models have not been loaded successfully, server still becomes ready when the loading procedure finishes.

Check KServe documentation for more details.

Server Metadata API#

Gets information about the server itself.

Check KServe documentation for more details.

Model Ready API#

Gets information about readiness of the specific model. Model is ready when it’s fully capable to run inference.

Check KServe documentation for more details.

Model Metadata API#

Gets information about the specific model.

Check KServe documentation for more details.

Inference API#

Run inference with requested model, DAG or MediaPipe Graph.

Check KServe documentation for more details.

NOTE: Inference supports putting tensor buffers either in ModelInferRequest’s InferTensorContents and raw_input_contents. There is no support for BF16 data type and there is no support for using FP16 in InferTensorContents. In case of sending images files or strings BYTES data type should be used and data should be put in InferTensorContents’s bytes_contents or raw_input_contents.

Also, using BYTES datatype it is possible to send to model or pipeline, that have 4 (or 5 in case of demultiplexing) shape dimensions, binary encoded images that would be preprocessed by OVMS using opencv and converted to OpenVINO-friendly format. For more information check how binary data is handled in OpenVINO Model Server

Streaming Inference API (extension)#

Run streaming inference with MediaPipe Graph.

Check documentation for more details.

See Also#

Example client code shows how to use GRPC API and REST API.
KServe API
gRPC