KServe compatible RESTful API¶
Introduction¶
In addition with gRPC APIs OpenVINO model server also supports RESTful APIs which follows the documentation from KServe REST API. REST API is recommended when the primary goal is in reducing the number of client-side python dependencies and simpler application code.
This document covers the following API:
Server Live API¶
Description
Get information about server liveness.
URL
GET http://${REST_URL}:${REST_PORT}/v2/health/live
Response format
The information about server liveness is provided in the response status code. If server is alive, status code is 200. Otherwise it’s 4xx. Response does not have any content in the body.
Usage Example
$ curl -i http://localhost:5000/v2/health/live
HTTP/1.1 200 OK
Content-Type: application/json
Date: Tue, 09 Aug 2022 09:20:24 GMT
Content-Length: 2
See also code samples for getting server liveness with KServe API on HTTP Server Live endpoint.
Server Ready API¶
Description
Get information about server readiness.
URL
GET http://${REST_URL}:${REST_PORT}/v2/health/ready
Response format
The information about server readiness is provided in the response status code. If server is ready, status code is 200. Otherwise it’s 4xx. Response does not have any content in the body.
Usage Example
$ curl -i http://localhost:5000/v2/health/ready
HTTP/1.1 200 OK
Content-Type: application/json
Date: Tue, 09 Aug 2022 09:22:14 GMT
Content-Length: 2
See also code samples for getting server readiness with KServe API on HTTP Server Ready endpoint.
Server Metadata API¶
Description
Get information about the server.
URL
GET http://${REST_URL}:${REST_PORT}/v2
Response format
If successful:
{
"name" : $string,
"version" : $string,
"extensions" : [ $string, ... ]
}
Else:
{
"error": $string
}
Usage Example
$ curl http://localhost:5000/v2
{"name":"OpenVINO Model Server","version":"2022.2.0.fd742507"}
For detailed description of the response contents see KServe API docs.
See also code samples for getting server metadata with KServe API on HTTP Server Metadata endpoint.
Model Ready API¶
Description
Get information about model readiness.
URL
GET http://${REST_URL}:${REST_PORT}/v2/models/${MODEL_NAME}[/versions/${MODEL_VERSION}]/ready
Response format
The information about model readiness is provided in the response status code. If model is ready for inference, status code is 200. Otherwise it’s 4xx. Response does not have any content in the body.
Usage Example
$ curl -i http://localhost:5000/v2/models/resnet/ready
HTTP/1.1 200 OK
Content-Type: application/json
Date: Tue, 09 Aug 2022 09:25:31 GMT
Content-Length: 2
See also code samples for getting model readiness with KServe API on HTTP Model Ready endpoint.
Model Metadata API¶
Description
Get information about the model.
URL
GET http://${REST_URL}:${REST_PORT}/v2/models/${MODEL_NAME}[/versions/${MODEL_VERSION}]
Note :Including ${MODEL_VERSION} is optional. If omitted the model metadata for the latest version is returned in the response. ???
Response format
If successful:
{
"name" : $string,
"versions" : [ $string, ... ] #optional,
"platform" : $string,
"inputs" : [ $metadata_tensor, ... ],
"outputs" : [ $metadata_tensor, ... ]
}
where:
$metadata_tensor =
{
"name" : $string,
"datatype" : $string,
"shape" : [ $number, ... ]
}
Else:
{
"error": $string
}
Usage example
$ curl http://localhost:8000/v2/models/resnet
{"name":"resnet","versions":["1"],"platform":"OpenVINO","inputs":[{"name":"0","datatype":"FP32","shape":[1,224,224,3]}],"outputs":[{"name":"1463","datatype":"FP32","shape":[1,1000]}]}
For detailed description of the response contents see KServe API docs.
See also code samples for running getting model metadata with KServe API on HTTP Model Metadata endpoint.
Inference API¶
Description
Endpoint for running an inference with loaded models or DAGs.
URL
POST http://${REST_URL}:${REST_PORT}/v2/models/${MODEL_NAME}[/versions/${MODEL_VERSION}]/infer
Request Body Format
{
"id" : $string #optional,
"parameters" : $parameters #optional,
"inputs" : [ $request_input, ... ],
"outputs" : [ $request_output, ... ] #optional
}
where:
$request_input =
{
"name" : $string,
"shape" : [ $number, ... ],
"datatype" : $string,
"parameters" : $parameters #optional,
"data" : $tensor_data
}
$request_output =
{
"name" : $string,
"parameters" : $parameters #optional,
}
Besides numerical values, it is possible to pass binary inputs using Binary Data extension:
As a JPEG / PNG encoded images - in this case binary encoded data is loaded by OVMS using OpenCV which then converts it to OpenVINO-friendly data format for inference. For encoded inputs datatype BYTES
is reserved.
Content-Type: application/octet-stream
Inference-Header-Content-Length: <xx>
Content-Length: <xx+9472>
{
"model_name" : "my_model",
"inputs" : [
{
"name" : "model_input",
"shape" : [ 1 ],
"datatype" : "BYTES"
}
]
}
<9472 bytes of data for model_input tensor>
As a raw data - it means it wont be preprocessed by OVMS. To send raw data using Binary Data extension use other datatypes than BYTES
.
Content-Type: application/octet-stream
Inference-Header-Content-Length: <xx>
Content-Length: <xx+(3 x 1080000)>
{
"model_name" : "my_model",
"inputs" : [
{
"name" : "model_input",
"shape" : [ 3, 300, 300, 3 ],
"datatype" : "FP32"
},
]
}
<3240000 bytes of the whole data batch for model_input tensor>
Check how binary data is handled in OpenVINO Model Server for more informations.
Response Format
If successful:
{
"model_name" : $string,
"model_version" : $string #optional,
"id" : $string,
"parameters" : $parameters #optional,
"outputs" : [ $response_output, ... ]
}
where:
$response_output =
{
"name" : $string,
"shape" : [ $number, ... ],
"datatype" : $string,
"parameters" : $parameters #optional,
"data" : $tensor_data
}
Else:
{
"error": <error message string>
}
Outputs of response can be send in binary format using Binary Data extension. To force a output to be sent in binary format you need to use “binary_data” : true parameter in request JSON. For example:
{
"model_name" : "mymodel",
"inputs" : [...],
"outputs" : [
{
"name" : "output0",
"parameters" : {
"binary_data" : true
}
}
]
}
Assuming the output datatype is FP32 and shape is [ 2, 2 ] response to this request would be:
HTTP/1.1 200 OK
Content-Type: application/octet-stream
Inference-Header-Content-Length: <yy>
Content-Length: <yy+16>
{
"outputs" : [
{
"name" : "output0",
"shape" : [ 2, 2 ],
"datatype" : "FP32",
"parameters" : {
"binary_data_size" : 16
}
}
]
}
<16 bytes of data for output0 tensor>
For detailed description of request and response contents see KServe API docs.
Note: More efficient way of running inference via REST is sending data in a binary format outside of the JSON object, by using binary data extension.
See also code samples for running inference with KServe API on HTTP Inference endpoint.