KServe compatible RESTful API¶
Introduction¶
In addition with gRPC APIs OpenVINO model server also supports RESTful APIs which follows the documentation from KServe REST API. REST API is recommended when the primary goal is in reducing the number of client-side python dependencies and simpler application code.
This document covers the following API:
Server Live API¶
Description
Get information about server liveness.
URL
GET http://${REST_URL}:${REST_PORT}/v2/health/live
Response format
The information about server liveness is provided in the response status code. If server is alive, status code is 200. Otherwise it’s 4xx. Response does not have any content in the body.
Usage Example
$ curl -i http://localhost:5000/v2/health/live
HTTP/1.1 200 OK
Content-Type: application/json
Date: Tue, 09 Aug 2022 09:20:24 GMT
Content-Length: 2
See also code samples for getting server liveness with KServe API on HTTP Server Live endpoint.
Server Ready API¶
Description
Get information about server readiness.
URL
GET http://${REST_URL}:${REST_PORT}/v2/health/ready
Response format
The information about server readiness is provided in the response status code. If server is ready, status code is 200. Otherwise it’s 4xx. Response does not have any content in the body.
Usage Example
$ curl -i http://localhost:5000/v2/health/ready
HTTP/1.1 200 OK
Content-Type: application/json
Date: Tue, 09 Aug 2022 09:22:14 GMT
Content-Length: 2
See also code samples for getting server readiness with KServe API on HTTP Server Ready endpoint.
Server Metadata API¶
Description
Get information about the server.
URL
GET http://${REST_URL}:${REST_PORT}/v2
Response format
If successful:
{
"name" : $string,
"version" : $string,
"extensions" : [ $string, ... ]
}
Else:
{
"error": $string
}
Usage Example
$ curl http://localhost:5000/v2
{"name":"OpenVINO Model Server","version":"2022.2.0.fd742507"}
For detailed description of the response contents see KServe API docs.
See also code samples for getting server metadata with KServe API on HTTP Server Metadata endpoint.
Model Ready API¶
Description
Get information about model readiness.
URL
GET http://${REST_URL}:${REST_PORT}/v2/models/${MODEL_NAME}[/versions/${MODEL_VERSION}]/ready
Response format
The information about model readiness is provided in the response status code. If model is ready for inference, status code is 200. Otherwise it’s 4xx. Response does not have any content in the body.
Usage Example
$ curl -i http://localhost:5000/v2/models/resnet/ready
HTTP/1.1 200 OK
Content-Type: application/json
Date: Tue, 09 Aug 2022 09:25:31 GMT
Content-Length: 2
See also code samples for getting model readiness with KServe API on HTTP Model Ready endpoint.
Model Metadata API¶
Description
Get information about the model.
URL
GET http://${REST_URL}:${REST_PORT}/v2/models/${MODEL_NAME}[/versions/${MODEL_VERSION}]
Note :Including ${MODEL_VERSION} is optional. If omitted the model metadata for the latest version is returned in the response. ???
Response format
If successful:
{
"name" : $string,
"versions" : [ $string, ... ] #optional,
"platform" : $string,
"inputs" : [ $metadata_tensor, ... ],
"outputs" : [ $metadata_tensor, ... ]
}
where:
$metadata_tensor =
{
"name" : $string,
"datatype" : $string,
"shape" : [ $number, ... ]
}
Else:
{
"error": $string
}
Usage example
$ curl http://localhost:8000/v2/models/resnet
{"name":"resnet","versions":["1"],"platform":"OpenVINO","inputs":[{"name":"0","datatype":"FP32","shape":[1,224,224,3]}],"outputs":[{"name":"1463","datatype":"FP32","shape":[1,1000]}]}
For detailed description of the response contents see KServe API docs.
See also code samples for running getting model metadata with KServe API on HTTP Model Metadata endpoint.
Inference API¶
Description
Endpoint for running an inference with loaded models or DAGs.
URL
POST http://${REST_URL}:${REST_PORT}/v2/models/${MODEL_NAME}[/versions/${MODEL_VERSION}]/infer
Request Body Format
{
"id" : $string #optional,
"parameters" : $parameters #optional,
"inputs" : [ $request_input, ... ],
"outputs" : [ $request_output, ... ] #optional
}
where:
$request_input =
{
"name" : $string,
"shape" : [ $number, ... ],
"datatype" : $string,
"parameters" : $parameters #optional,
"data" : $tensor_data
}
$request_output =
{
"name" : $string,
"parameters" : $parameters #optional,
}
Response Format
If successful:
{
"model_name" : $string,
"model_version" : $string #optional,
"id" : $string,
"parameters" : $parameters #optional,
"outputs" : [ $response_output, ... ]
}
where:
$response_output =
{
"name" : $string,
"shape" : [ $number, ... ],
"datatype" : $string,
"parameters" : $parameters #optional,
"data" : $tensor_data
}
Else:
{
"error": <error message string>
}
For detailed description of request and response contents see KServe API docs.
Note: More efficient way of running inference via REST is sending data in a binary format outside of the JSON object, by using binary data extension.
See also code samples for running inference with KServe API on HTTP Inference endpoint.