Model Ensemble Pipeline Demo

This guide shows how to implement a model ensemble using the DAG Scheduler.

  • Let’s consider you develop an application to perform image classification. There are many different models that can be used for this task. The goal is to combine results from inferences executed on two different models and calculate argmax to pick the most probable classification label.

  • For this task, select two models: googlenet-v2 and resnet-50. Additionally, create own model argmax to combine and select top result. The aim is to perform this task on the server side with no intermediate results passed over the network. The server should take care of feeding inputs/outputs in subsequent models. Both - googlenet and resnet predictions should run in parallel.

  • Diagram for this pipeline would look like this:

diagram

Step 1: Prepare the repository

Clone the repository and enter model_ensemble directory

git clone https://github.com/openvinotoolkit/model_server.git
cd model_server/demos/model_ensemble/python

Repository preparation is simplified with make script, just run make in this repository.

The steps in Makefile are:

  1. Download and use the models from open model zoo.

  2. Use python script located in this repository. Since it uses tensorflow to create models in saved model format, hence tensorflow pip package is required.

  3. Prepare argmax model with (1, 1001) input shapes to match output of the googlenet and resnet output shapes. The generated model will sum inputs and calculate the index with the highest value. The model output will indicate the most likely predicted class from the ImageNet* dataset.

  4. Convert models to IR format and prepare models repository.

...
models
├── argmax
   └── 1
       ├── saved_model.bin
       ├── saved_model.mapping
       └── saved_model.xml
├── config.json
├── googlenet-v2-tf
   └── 1
       ├── googlenet-v2-tf.bin
       ├── googlenet-v2-tf.mapping
       └── googlenet-v2-tf.xml
└── resnet-50-tf
    └── 1
        ├── resnet-50-tf.bin
        ├── resnet-50-tf.mapping
        └── resnet-50-tf.xml

6 directories, 10 files

Step 2: Define required models and pipeline

Pipelines need to be defined in the configuration file to use them. The same configuration file is used to define served models and served pipelines.

Use the config.json located here, the content is as follows:

~$ cat config.json
{
    "model_config_list": [
        {
            "config": {
                "name": "googlenet",
                "base_path": "/models/googlenet-v2-tf"
            }
        },
        {
            "config": {
                "name": "resnet",
                "base_path": "/models/resnet-50-tf"
            }
        },
        {
            "config": {
                "name": "argmax",
                "base_path": "/models/argmax"
            }
        }
    ],
    "pipeline_config_list": [
        {
            "name": "image_classification_pipeline",
            "inputs": ["image"],
            "nodes": [
                {
                    "name": "googlenet_node",
                    "model_name": "googlenet",
                    "type": "DL model",
                    "inputs": [
                        {"input": {"node_name": "request",
                                   "data_item": "image"}}
                    ],
                    "outputs": [
                        {"data_item": "InceptionV2/Predictions/Softmax",
                         "alias": "probability"}
                    ]
                },
                {
                    "name": "resnet_node",
                    "model_name": "resnet",
                    "type": "DL model",
                    "inputs": [
                        {"map/TensorArrayStack/TensorArrayGatherV3": {"node_name": "request",
                                                                      "data_item": "image"}}
                    ],
                    "outputs": [
                        {"data_item": "softmax_tensor",
                         "alias": "probability"}
                    ]
                },
                {
                    "name": "argmax_node",
                    "model_name": "argmax",
                    "type": "DL model",
                    "inputs": [
                        {"input1": {"node_name": "googlenet_node",
                                    "data_item": "probability"}},
                        {"input2": {"node_name": "resnet_node",
                                    "data_item": "probability"}}
                    ],
                    "outputs": [
                        {"data_item": "argmax/Squeeze",
                         "alias": "most_probable_label"}
                    ]
                }
            ],
            "outputs": [
                {"label": {"node_name": "argmax_node",
                           "data_item": "most_probable_label"}}
            ]
        }
    ]
}

In the model_config_list section, three models are defined as usual. We can refer to them by name in the pipeline definition but we can also request single inference on them separately. The same inference gRPC and REST API is used to request models and pipelines. OpenVINO Model Server will first try to search for a model with the requested name. If not found, it will try to find pipeline.

Step 3: Start the Model Server

  1. Run command to start the Model Server

    $ docker run --rm -v $(pwd)/models/:/models:ro -p 9100:9100 -p 8100:8100 openvino/model_server:latest --config_path /models/config.json --port 9100 --rest_port 8100

Step 4: Requesting the service

Input images can be sent to the service requesting resource name image_classification_pipeline. There is an example client doing that:

  1. Check accuracy of the pipeline by running the client in another terminal:

    ~$ cd model_server/client/python/tensorflow-serving-api/samples
    ~/model_server/client/python/tensorflow-serving-api/samples$ virtualenv .venv
    ~/model_server/client/python/tensorflow-serving-api/samples$ . .venv/bin/activate && pip3 install -r requirements.txt
    (.venv) ~/model_server/client/python/tensorflow-serving-api/samples$ python3 grpc_predict_resnet.py --pipeline_name image_classification_pipeline --images_numpy_path ../../imgs.npy \
        --labels_numpy_path ../../lbs.npy --grpc_port 9100 --input_name image --output_name label --transpose_input True --transpose_method nchw2nhwc --iterations 10
    Image data range: 0.0 : 255.0
    Start processing:
            Model name: image_classification_pipeline
            Iterations: 10
            Images numpy path: ../../imgs.npy
            Numpy file shape: (10, 224, 224, 3)
    
    Iteration 1; Processing time: 33.51 ms; speed 29.85 fps
    imagenet top results in a single batch:
    response shape (1,)
             0 airliner 404 ; Correct match.
    Iteration 2; Processing time: 42.52 ms; speed 23.52 fps
    imagenet top results in a single batch:
    response shape (1,)
             0 Arctic fox, white fox, Alopex lagopus 279 ; Correct match.
    Iteration 3; Processing time: 34.42 ms; speed 29.05 fps
    imagenet top results in a single batch:
    response shape (1,)
             0 bee 309 ; Correct match.
    Iteration 4; Processing time: 32.34 ms; speed 30.92 fps
    imagenet top results in a single batch:
    response shape (1,)
             0 golden retriever 207 ; Correct match.
    Iteration 5; Processing time: 35.92 ms; speed 27.84 fps
    imagenet top results in a single batch:
    response shape (1,)
             0 gorilla, Gorilla gorilla 366 ; Correct match.
    Iteration 6; Processing time: 33.63 ms; speed 29.74 fps
    imagenet top results in a single batch:
    response shape (1,)
             0 magnetic compass 635 ; Correct match.
    Iteration 7; Processing time: 37.22 ms; speed 26.86 fps
    imagenet top results in a single batch:
    response shape (1,)
             0 peacock 84 ; Correct match.
    Iteration 8; Processing time: 35.84 ms; speed 27.90 fps
    imagenet top results in a single batch:
    response shape (1,)
             0 pelican 144 ; Correct match.
    Iteration 9; Processing time: 33.69 ms; speed 29.68 fps
    imagenet top results in a single batch:
    response shape (1,)
             0 snail 113 ; Correct match.
    Iteration 10; Processing time: 46.54 ms; speed 21.49 fps
    imagenet top results in a single batch:
    response shape (1,)
             0 zebra 340 ; Correct match.
    
    processing time for all iterations
    average time: 36.00 ms; average speed: 27.78 fps
    median time: 34.50 ms; median speed: 28.99 fps
    max time: 46.00 ms; min speed: 21.74 fps
    min time: 32.00 ms; max speed: 31.25 fps
    time percentile 90: 42.40 ms; speed percentile 90: 23.58 fps
    time percentile 50: 34.50 ms; speed percentile 50: 28.99 fps
    time standard deviation: 4.31
    time variance: 18.60
    Classification accuracy: 100.00

Step 5: Analyze pipeline execution in server logs

By analyzing debug logs and timestamps it is seen that GoogleNet and ResNet model inferences were started in parallel. Just after all inputs became ready - argmax node has started its job.

[2022-02-28 11:30:20.159][485][serving][debug][prediction_service.cpp:69] Processing gRPC request for model: image_classification_pipeline; version: 0
[2022-02-28 11:30:20.159][485][serving][debug][prediction_service.cpp:80] Requested model: image_classification_pipeline does not exist. Searching for pipeline with that name...
[2022-02-28 11:30:20.159][485][serving][debug][modelmanager.cpp:1305] Requesting pipeline: image_classification_pipeline;
[2022-02-28 11:30:20.160][485][dag_executor][debug][pipeline.cpp:83] Started execution of pipeline: image_classification_pipeline
[2022-02-28 11:30:20.160][485][serving][debug][modelmanager.cpp:1280] Requesting model: resnet; version: 0.
[2022-02-28 11:30:20.160][485][serving][debug][modelmanager.cpp:1280] Requesting model: googlenet; version: 0.
[2022-02-28 11:30:20.194][485][serving][debug][modelmanager.cpp:1280] Requesting model: argmax; version: 0.

Step 6: Requesting pipeline metadata

We can use the same gRPC/REST example client as we use for requesting model metadata. The only difference is we specify pipeline name instead of the model name.

(.venv) ~/model_server/client/python/tensorflow-serving-api/samples$ python3 grpc_get_model_metadata.py --grpc_port 9100 --model_name image_classification_pipeline
Getting model metadata for model: image_classification_pipeline
Inputs metadata:
        Input name: image; shape: [1, 224, 224, 3]; dtype: DT_FLOAT
Outputs metadata:
        Output name: label; shape: [1]; dtype: DT_INT64