C++ Client for Predictions¶
To build an exemplary C++ client:
git clone https://github.com/openvinotoolkit/model_server.git
cd example_client/cpp
make
This will build a container image called ovms_cpp_clients
with all required dependencies.
There are 3 clients:
classification_client_sync - simple client using synchronous gRPC API, testing accurracy of classification models
classification_client_async_benchmark - client using asynchronous gRPC API, testing accurracy and performance with real image data
synthetic_client_async_benchmark - client using asynchronous gRPC API, testing performance with synthetic data, stripped out of OpenCV dependency
The example clients image also contains test images required for accurracy measurements. It is also possible to use custom images.
Prepare classification model¶
Start OVMS with resnet50-binary model:
curl -L --create-dir https://storage.openvinotoolkit.org/repositories/open_model_zoo/2021.4/models_bin/3/resnet50-binary-0001/FP32-INT1/resnet50-binary-0001.bin -o resnet50-binary/1/model.bin https://storage.openvinotoolkit.org/repositories/open_model_zoo/2021.4/models_bin/3/resnet50-binary-0001/FP32-INT1/resnet50-binary-0001.xml -o resnet50-binary/1/model.xml
Client requesting prediction synchronously¶
The client sends requests synchronously and displays latency for each request. You can specify number of iterations and layout: nchw
, nhwc
or binary
. Each request contains image in selected format. The client also tests server responses for accurracy.
Prepare the server¶
docker run -d -u $(id -u):$(id -g) -v $(pwd)/resnet50-binary:/model -p 9001:9001 openvino/model_server:latest \
--model_path /model --model_name resnet --port 9001 --layout NHWC
Start the client:¶
docker run --rm --network host -e "no_proxy=localhost" ovms_cpp_clients ./classification_client_sync --grpc_port=9001 --iterations=10 --layout="binary"
call predict ok
call predict time: 24ms
outputs size is 1
call predict ok
call predict time: 23ms
outputs size is 1
call predict ok
call predict time: 23ms
outputs size is 1
...
Overall accuracy: 90%
Total time divided by number of requests: 25ms
Clients requesting predictions asynchronously¶
The client sends requests asynchronously to mimic parallel clients scenario. There are plenty of parameters to configure those clients.
name |
description |
default |
available with synthetic data |
---|---|---|---|
grpc_address |
url to grpc service |
localhost |
yes |
grpc_port |
port to grpc service |
9000 |
yes |
model_name |
model name to request |
resnet |
yes |
input_name |
input tensor name with image |
0 |
no, deduced automatically |
output_name |
output tensor name with classification result |
1463 |
no |
iterations |
number of requests to be send by each producer thread |
10 |
yes |
batch_size |
batch size of each iteration |
1 |
no, deduced automatically |
images_list |
path to a file with a list of labeled images |
input_images.txt |
no |
layout |
binary, nhwc or nchw |
nchw |
no, deduced automatically |
producers |
number of threads asynchronously scheduling prediction |
1 |
yes |
consumers |
number of threads receiving responses |
8 |
yes |
max_parallel_requests |
maximum number of parallel inference requests; 0=no limit |
100 |
yes |
benchmark_mode |
1 removes pre/post-processing and logging; 0 enables accurracy measurement |
0 |
no |
Async client with real image data¶
Prepare the server¶
docker run -d -u $(id -u):$(id -g) -v $(pwd)/resnet50-binary:/model -p 9001:9001 openvino/model_server:latest \
--model_path /model --model_name resnet --port 9001 --layout NCHW
Start the client:¶
docker run --rm --network host -e "no_proxy=localhost" ovms_cpp_clients ./classification_client_async_benchmark --grpc_port=9001 --layout="nchw" --iterations=2000 --batch_size=1 --max_parallel_requests=100 --consumers=8 --producers=1 --benchmark_mode=1
Address: localhost:9001
Model name: resnet
Images list path: input_images.txt
Running the workload...
========================
Summary
========================
Benchmark mode: True
Accuracy: N/A
Total time: 1976ms
Total iterations: 2000
Layout: nchw
Batch size: 1
Producer threads: 1
Consumer threads: 8
Max parallel requests: 100
Avg FPS: 1012.15
Async client with synthetic data¶
This client is simplified to test performance of any model/pipeline by requesting GetModelMetadata
endpoint and using such information to prepare synthetic inputs with matching shape and precision. It also does not need OpenCV as dependency.
NOTE: It is required that endpoint does not use dynamic shape.
Prepare the server¶
docker run -d -u $(id -u):$(id -g) -v $(pwd)/resnet50-binary:/model -p 9001:9001 openvino/model_server:latest \
--model_path /model --model_name resnet --port 9001 --layout NCHW
Start the client:¶
docker run --rm --network host -e "no_proxy=localhost" ovms_cpp_clients ./synthetic_client_async_benchmark --grpc_port=9001 --iterations=2000 --max_parallel_requests=100 --consumers=8 --producers=1
Address: localhost:11337
Model name: resnet
Synthetic inputs:
0: (1,3,224,224); DT_FLOAT
Running the workload...
========================
Summary
========================
Total time: 1933ms
Total iterations: 2000
Producer threads: 1
Consumer threads: 8
Max parallel requests: 100
Avg FPS: 1034.66