This document gives information about OpenVINO Model Server gRPC API. It is documented in the proto buffer files in tensorflow_serving_api. Using the gRPC interface is recommended for optimal performance due to its faster implementation of input data deserialization. gRPC achieves lower latency, especially with larger input messages like images.
This document covers following API:
The implementations for Predict, GetModelMetadata and GetModelStatus function calls are currently available.
These are the most generic function calls and should address most of the usage scenarios.
Model Status API¶
Gets information about the status of served models including Model Version
Get Model Status proto defines three message definitions used while calling Status endpoint: GetModelStatusRequest, ModelVersionStatus, GetModelStatusResponse that are used to report all exposed versions including their state in their lifecycle.
Read more about Get Model Status API usage.
Model Metadata API¶
Gets information about the served models. A function called GetModelMetadata accepts model spec information as input and returns Signature Definition content in a format similar to TensorFlow Serving.
Get Model Metadata proto has three message definitions: SignatureDefMap, GetModelMetadataRequest, GetModelMetadataResponse.
Read more about Get Model Metadata API usage.
Endpoint for running an inference with loaded models or DAGs.
Predict proto has two message definitions: PredictRequest and PredictResponse.
PredictRequest specifies information about the model spec, a map of input data serialized via TensorProto to a string format.
PredictResponse includes a map of outputs serialized by TensorProto and information about the used model spec.
Read more about Predict API usage