Benchmark Client (Python)¶
Introduction¶
The benchmark client introduced in this directory is written in Python 3. Benchmark client uses TFServing API and KServe API to communicate with model servers. It is recommended to use the benchmark client as a docker container. Prior to transmission, the client downloads metadata from the server, which contains a list of available models, their versions as well as accepted input and output shapes. Then it generates tensors containing random data with shapes matched to the models served by the service. Both the length of the dataset and the workload duration can be specified independently. The synthetic data created is then served in a loop iterating over the dataset until the workload length is satisfied. As the main role of the client is performance measurement all aspects unrelated to throughput and/or latency are ignored. This means the client does not validate the received responses nor does it estimate accuracy as these activities would negatively affect the measured performance metrics on the client side.
In addition to the standard data format, the client also supports stateful models (recognizing dependencies between consecutive inference requests) as well as binary input for selected file formats (PNG and JPEG).
Furthermore the client supports multiple precisions: FP16
, FP32
, FP64
, INT8
, INT16
, INT32
, INT64
, UINT8
, UINT16
, UINT32
, UINT64
. Both channel types, insecure and certificate secured, are supported. Secrets/certificates have to be mounted on a separated volume as well as their path has to be specified by command line. The secure connection can be used, for example, to benchmark the Nginx OVMS plugin, which can be build from public source with the built-in Nginx reverse proxy load balancer.
A single docker container can run many parallel clients in separate processes. Measured metrics (especially throughput, latency,
and counters) are collected from all client processes and then combined upon which they can be printed in JSON format/syntax for
the entire parallel workload. If the docker container is run in the daemon mode the final logs can be shown using the docker logs
command. Results can also be exported to a Mongo database. In order to do this the appropriate identification metadata has to
be specified in the command line.
Since 2.7 update, these Benchmark Client measurement options were introduced: language models testing with in-built string data input and support for testing MediaPipe
graphs in the OVMS. For each of them, there is a need to specify the input data method. Data method -d string
creates sample text data.
Benchmarking of OVMS integrated with MediaPipe is possible for KServe API via gRPC protocol. In this case there is also a necessity to feed client with a pre-prepared numpy
file consisting of a numpy array.
There is also an option to test dynamic models using shape
parameter.
Last two use cases are furthermore described in this document.
Build Benchmark Client Docker Image¶
To build the docker image and tag it as benchmark_client
run:
git clone https://github.com/openvinotoolkit/model_server.git
cd model_server/demos/benchmark/python
docker build . -t benchmark_client
OVMS Deployment¶
First of all, download a model and create an appropriate directory tree. For example, for resnet50 binary model from Intel’s Open Model Zoo:
mkdir -p workspace/resnet50-binary-0001/1
cd workspace/resnet50-binary-0001/1
wget https://storage.openvinotoolkit.org/repositories/open_model_zoo/2022.1/models_bin/2/resnet50-binary-0001/FP32-INT1/resnet50-binary-0001.xml
wget https://storage.openvinotoolkit.org/repositories/open_model_zoo/2022.1/models_bin/2/resnet50-binary-0001/FP32-INT1/resnet50-binary-0001.bin
cd ../../..
Model directory looks like that:
workspace
└── resnet50-binary-0001
└── 1
├── resnet50-binary-0001.bin
└── resnet50-binary-0001.xml
Let’s start OVMS before building and running the benchmark client as follows (more deployment options described in docs):
docker run -u $(id -u) -p 9000:9000 -p 8000:8000 -d -v ${PWD}/workspace:/workspace openvino/model_server --model_path \
/workspace/resnet50-binary-0001 --model_name resnet50-binary-0001 --port 9000 --rest_port 8000
Selected Commands¶
To check available options use -h
, --help
switches:
docker run benchmark_client --help
usage: main.py [-h] [-i ID] [-c CONCURRENCY] [-a SERVER_ADDRESS]
[-p GRPC_PORT] [-r REST_PORT] [-l] [-b [BS ...]]
[-s [SHAPE ...]] [-d [DATA ...]] [-j] [-m MODEL_NAME]
[-k DATASET_LENGTH] [-v MODEL_VERSION] [-n STEPS_NUMBER]
[-t DURATION] [-u WARMUP] [-w WINDOW] [-e ERROR_LIMIT]
[-x ERROR_EXPOSITION] [--max_throughput MAX_THROUGHPUT]
[--max_value MAX_VALUE] [--min_value MIN_VALUE] [--xrand XRAND]
[--dump_png] [--step_timeout STEP_TIMEOUT]
[--metadata_timeout METADATA_TIMEOUT] [-Y DB_ENDPOINT]
[-y [DB_METADATA ...]] [--print_all] [--print_time]
[--report_warmup] [--certs_dir CERTS_DIR] [-q STATEFUL_LENGTH]
[--stateful_id STATEFUL_ID] [--stateful_hop STATEFUL_HOP]
[--sync_interval SYNC_INTERVAL]
[--quantile_list [QUANTILE_LIST ...]]
[--hist_factor HIST_FACTOR] [--hist_base HIST_BASE]
[--internal_version] [--unbuffered] [--api {TFS,KFS,REST}]
This is benchmarking client which uses TFS/KFS API to communicate with
OVMS/TFS/KFS-based-services.
The version can be checked by using --internal_version
switch as follows:
docker run benchmark_client --internal_version
2.7
The client is able to download the metadata of the served models. If you are
unsure which models and versions are served and what status they have, you can
list this information by specifying the --list_models
switch (also a short
form -l
is available):
docker run --network host benchmark_client -a localhost -r 8000 --list_models
Client 2.7
NO_PROXY=localhost no_proxy=localhost python3 /ovms_benchmark_client/main.py -a localhost -r 8000 --list_models
XI worker: try to send request to endpoint: http://localhost:8000/v1/config
XI worker: received status code is 200.
XI worker: found models and their status:
XI worker: model: resnet50-binary-0001, version: 1 - AVAILABLE
Sample benchmarks¶
Names, model shape, as well as information about data types of both inputs and
outputs can also be downloaded for all available models using the same listing
switches and adding -m <model-name>
and -v <model-version>
to the command
line. The option -i
is used only to add a prefix to the standard output with a name
of an application instance. For example:
docker run --network host benchmark_client -a localhost -r 8000 -l -m resnet50-binary-0001 -p 9000 -i id
Client 2.7
NO_PROXY=localhost no_proxy=localhost python3 /ovms_benchmark_client/main.py -a localhost -r 8000 -l -m resnet50-binary-0001 -p 9000 -i id
XW id: Finished execution. If you want to run inference remove --list_models.
XI id: try to send request to endpoint: http://localhost:8000/v1/config
XI id: received status code is 200.
XI id: found models and their status:
XI id: model: resnet50-binary-0001, version: 1 - AVAILABLE
XI id: request for metadata of model resnet50-binary-0001...
XI id: Metadata for model resnet50-binary-0001 is downloaded...
XI id: set version of model resnet50-binary-0001: 1
XI id: inputs:
XI id: 0:
XI id: name: 0
XI id: dtype: DT_FLOAT
XI id: tensorShape: {'dim': [{'size': '1'}, {'size': '3'}, {'size': '224'}, {'size': '224'}]}
XI id: outputs:
XI id: 1463:
XI id: name: 1463
XI id: dtype: DT_FLOAT
XI id: tensorShape: {'dim': [{'size': '1'}, {'size': '1000'}]}
Be sure the model name specified is identical to the model name shown when using
the --list_models
parameter. A model version is not required but it can be added
when multiple versions are available for a specific model name.
The introduced benchmark client supports generation of requests with multiple and
different batch sizes in a single workload. The switches -b
, --bs
can be used
to specify this parameter.
The workload can be generated only if its length is specified by iteration number
-n
, --steps_number
or duration length -t
, --duration
. To see report also on warmup time window use --report_warmup
switch. Example for 8 requests
will be generated as follows (remember to add --print_all
to show metrics in stdout):
docker run --network host benchmark_client -a localhost -r 8000 -m resnet50-binary-0001 -p 9000 -n 8 --report_warmup --print_all
Client 2.7
NO_PROXY=localhost no_proxy=localhost python3 /ovms_benchmark_client/main.py -a localhost -r 8000 -m resnet50-binary-0001 -p 9000 -n 8 --report_warmup --print_all
XI worker: request for metadata of model resnet50-binary-0001...
XI worker: Metadata for model resnet50-binary-0001 is downloaded...
XI worker: set version of model resnet50-binary-0001: 1
XI worker: inputs:
XI worker: 0:
XI worker: name: 0
XI worker: dtype: DT_FLOAT
XI worker: tensorShape: {'dim': [{'size': '1'}, {'size': '3'}, {'size': '224'}, {'size': '224'}]}
XI worker: outputs:
XI worker: 1463:
XI worker: name: 1463
XI worker: dtype: DT_FLOAT
XI worker: tensorShape: {'dim': [{'size': '1'}, {'size': '1000'}]}
XI worker: new random range: 0.0, 255.0
XI worker: batchsize sequence: [1]
XI worker: dataset length (0): 1
XI worker: --> dim: 1
XI worker: --> dim: 3
XI worker: --> dim: 224
XI worker: --> dim: 224
XI worker: Generated data shape: (1, 3, 224, 224)
XI worker: start workload...
XI worker: stop warmup: 374943.047975389
XI worker: stop window: inf
XI worker: Workload started!
XI worker: Warmup normally stopped: 374943.075028319
XI worker: Window normally start: 374943.07504102
XI worker: Window stopped: 374943.354446821
XI worker: total_duration: 0.3065074360347353
XI worker: total_batches: 8
XI worker: total_frames: 8
XI worker: start_timestamp: 374943.047975189
XI worker: stop_timestamp: 374943.354482625
XI worker: pass_batches: 8
XI worker: fail_batches: 0
XI worker: pass_frames: 8
XI worker: fail_frames: 0
XI worker: first_latency: 0.027021727000828832
XI worker: pass_max_latency: 0.0449919409584254
XI worker: fail_max_latency: 0.0
XI worker: brutto_batch_rate: 26.100508697261724
XI worker: brutto_frame_rate: 26.100508697261724
XI worker: netto_batch_rate: 26.127558971388062
XI worker: netto_frame_rate: 26.127558971388062
XI worker: frame_passrate: 1.0
XI worker: batch_passrate: 1.0
XI worker: mean_latency: 0.0382737630061456
XI worker: mean_latency2: 0.0015027467953827884
XI worker: stdev_latency: 0.006153524252994284
XI worker: cv_latency: 0.16077656780197483
XI worker: pass_mean_latency: 0.0382737630061456
XI worker: pass_mean_latency2: 0.0015027467953827884
XI worker: pass_stdev_latency: 0.006153524252994284
XI worker: pass_cv_latency: 0.16077656780197483
XI worker: fail_mean_latency: 0.0
XI worker: fail_mean_latency2: 0.0
XI worker: fail_stdev_latency: 0.0
XI worker: fail_cv_latency: 0.0
XI worker: window_total_duration: 0.27940580097492784
XI worker: window_total_batches: 8
XI worker: window_total_frames: 8
XI worker: window_start_timestamp: 374943.07504102
XI worker: window_stop_timestamp: 374943.354446821
XI worker: window_pass_batches: 8
XI worker: window_fail_batches: 0
XI worker: window_pass_frames: 8
XI worker: window_fail_frames: 0
XI worker: window_first_latency: 0.027021727000828832
XI worker: window_pass_max_latency: 0.0449919409584254
XI worker: window_fail_max_latency: 0.0
XI worker: window_brutto_batch_rate: 28.632190069374655
XI worker: window_brutto_frame_rate: 28.632190069374655
XI worker: window_netto_batch_rate: 26.127558971388062
XI worker: window_netto_frame_rate: 26.127558971388062
XI worker: window_frame_passrate: 1.0
XI worker: window_batch_passrate: 1.0
XI worker: window_mean_latency: 0.0382737630061456
XI worker: window_mean_latency2: 0.0015027467953827884
XI worker: window_stdev_latency: 0.006153524252994284
XI worker: window_cv_latency: 0.16077656780197483
XI worker: window_pass_mean_latency: 0.0382737630061456
XI worker: window_pass_mean_latency2: 0.0015027467953827884
XI worker: window_pass_stdev_latency: 0.006153524252994284
XI worker: window_pass_cv_latency: 0.16077656780197483
XI worker: window_fail_mean_latency: 0.0
XI worker: window_fail_mean_latency2: 0.0
XI worker: window_fail_stdev_latency: 0.0
XI worker: window_fail_cv_latency: 0.0
XI worker: window_hist_latency_4: 2
XI worker: window_hist_latency_9: 1
XI worker: window_hist_latency_8: 5
XI worker: warmup_total_duration: 0.02705443004379049
XI worker: warmup_total_batches: 0
XI worker: warmup_total_frames: 0
XI worker: warmup_start_timestamp: 374943.047973889
XI worker: warmup_stop_timestamp: 374943.075028319
XI worker: warmup_pass_batches: 0
XI worker: warmup_fail_batches: 0
XI worker: warmup_pass_frames: 0
XI worker: warmup_fail_frames: 0
XI worker: warmup_first_latency: inf
XI worker: warmup_pass_max_latency: 0.0
XI worker: warmup_fail_max_latency: 0.0
XI worker: warmup_brutto_batch_rate: 0.0
XI worker: warmup_brutto_frame_rate: 0.0
XI worker: warmup_netto_batch_rate: 0.0
XI worker: warmup_netto_frame_rate: 0.0
XI worker: warmup_frame_passrate: 0.0
XI worker: warmup_batch_passrate: 0.0
XI worker: warmup_mean_latency: 0.0
XI worker: warmup_mean_latency2: 0.0
XI worker: warmup_stdev_latency: 0.0
XI worker: warmup_cv_latency: 0.0
XI worker: warmup_pass_mean_latency: 0.0
XI worker: warmup_pass_mean_latency2: 0.0
XI worker: warmup_pass_stdev_latency: 0.0
XI worker: warmup_pass_cv_latency: 0.0
XI worker: warmup_fail_mean_latency: 0.0
XI worker: warmup_fail_mean_latency2: 0.0
XI worker: warmup_fail_stdev_latency: 0.0
XI worker: warmup_fail_cv_latency: 0.0
Dynamic models benchmarking¶
In order to test dynamic models, -s
(shape) parameter needs to be specified. You can use dynamic model, although some static models also can have dynamic shape specified. First download the model to workspace.
mkdir -p workspace/face-detection-retail-0005/1
cd workspace/face-detection-retail-0005/1
wget https://storage.openvinotoolkit.org/repositories/open_model_zoo/2023.0/models_bin/1/face-detection-retail-0005/FP32/face-detection-retail-0005.bin
wget https://storage.openvinotoolkit.org/repositories/open_model_zoo/2023.0/models_bin/1/face-detection-retail-0005/FP32/face-detection-retail-0005.xml
cd ../../..
Next start OVMS having dynamic input shape specified.
docker run -u $(id -u) -p 9000:9000 -p 8000:8000 -d -v ${PWD}/workspace:/workspace openvino/model_server --model_path /workspace/face-detection-retail-0005 --model_name face-detection-retail-0005 --shape "(-1,3,-1,-1)" --port 9000 --rest_port 8000
To generate request with specified shape, it is necessary to set input shape explicitly. It is done by specifying -s
or --shape
parameter, followed by desired numbers.
docker run --network host benchmark_client -a localhost -r 8000 -m face-detection-retail-0005 -p 9000 -n 8 -s 1 3 300 300 --print_all
Client 2.7
NO_PROXY=localhost no_proxy=localhost python3 /ovms_benchmark_client/main.py -a localhost -r 8000 -m face-detection-retail-0005 -p 9000 -n 8 -s 1 3 300 300 --print_all
XI worker: request for metadata of model face-detection-retail-0005...
XI worker: Metadata for model face-detection-retail-0005 is downloaded...
XI worker: set version of model face-detection-retail-0005: 1
XI worker: inputs:
XI worker: input.1:
XI worker: name: input.1
XI worker: dtype: DT_FLOAT
XI worker: tensorShape: {'dim': [{'size': '-1'}, {'size': '3'}, {'size': '-1'}, {'size': '-1'}]}
XI worker: outputs:
XI worker: 527:
XI worker: name: 527
XI worker: dtype: DT_FLOAT
XI worker: tensorShape: {'dim': [{'size': '1'}, {'size': '1'}, {'size': '-1'}, {'size': '7'}]}
XI worker: new random range: 0.0, 255.0
XI worker: batchsize sequence: [1]
XI worker: dataset length (input.1): 1
XI worker: --> dim: 1
XI worker: --> dim: 3
XI worker: --> dim: 300
XI worker: --> dim: 300
XI worker: Generated data shape: (1, 3, 300, 300)
XI worker: start workload...
...
MediaPipe benchmarking¶
Start OVMS container with config.json
including mediapipe servable. OVMS should be built with MediaPipe enabled.
cp -r ${PWD}/sample_data ${PWD}/workspace/sample_data
docker run -u $(id -u) -p 9000:9000 -p 8000:8000 -d -v ${PWD}/workspace:/workspace openvino/model_server --port 9000 --rest_port 8000 --config_path /workspace/sample_data/config.json
Requests for benchmarking are prepared basing on array from a numpy file. This data file is fed to Benchmark Client by specifying switch -d <data-file>.npy
. Note that we can use numpy data in the same manner also for single models and pipelines if KServe API is set. You can create sample data with Python3
, specifying array shape and precision. Generated .npy file should be saved to workspace/sample_data directory for this example.
python -c 'import numpy as np ; \
arr = np.ones((1,3,224,224),dtype=np.float32); \
np.save("workspace/sample_data/resnet50-binary-0001.npy", arr)'
Having MediaPipe graph file and servable specified in config.json, we call it by its name instead of the model name: -m <mediapipe-servable-name>
. It is necessary to set --api KFS
since the Mediapipe graphs are exposed only via KServe API.
docker run -v ${PWD}/workspace:/workspace --network host benchmark_client -a localhost -r 8000 -m resnet50-binary-0001_mediapipe -p 9000 -n 8 --api KFS -d /workspace/sample_data/resnet50-binary-0001.npy --report_warmup --print_all
Many other client options together with benchmarking examples are presented in an additional PDF document.