Live Inference and Benchmark CT-scan Data with OpenVINO

This tutorial is also available as a Jupyter notebook that can be cloned directly from GitHub. See the installation guide for instructions to run this tutorial locally on Windows, Linux or macOS. To run without installing anything, click the launch binder button.

Binder Github

Kidney Segmentation with PyTorch Lightning and OpenVINO™ - Part 4

This tutorial is part of a series on how to train, optimize, quantize and show live inference on a medical segmentation model. The goal is to accelerate inference on a kidney segmentation model. The UNet model is trained from scratch; the data is from Kits19.

This tutorial shows how to

  • Visually compare inference results of an FP16 and INT8 OpenVINO IR model

  • Benchmark performance of the original model and the quantized model

  • Show live inference with OpenVINO’s async API and MULTI plugin

To learn how this model was quantized, please see the Convert and Quantize a UNet Model and Show Live Inference tutorial. The content of the current tutorial partly overlaps with that. It demonstrates how to visualize the results and show benchmark information when you already have a quantized model.

All notebooks in this series:


This notebook needs a quantized OpenVINO IR model. We provide a pretrained model trained for 20 epochs with the full Kits-19 frames dataset, which has an F1 score on the validation set of 0.9. The training code will be made available soon. It also needs images from the Kits19 dataset, converted to 2D images. For demonstration purposes, this tutorial will download one converted CT scan to use for inference.

To install the requirements for running this notebook, please follow the instructions in the README.


import glob
import os
import random
import sys
import time
import zipfile
from pathlib import Path
from typing import List

import cv2
import matplotlib.pyplot as plt
import numpy as np
from async_inference import CTAsyncPipeline, SegModel
from IPython.display import Image, display
from omz_python.models import model as omz_model
from openvino.inference_engine import IECore

from notebook_utils import benchmark_model, download_file


To use the pretrained models, set IR_PATH to "pretrained_model/unet44.xml" and COMPRESSED_MODEL_PATH to "pretrained_model/quantized_unet44.xml". To use a model that you trained or optimized yourself, adjust the model paths.

# Directory that contains the CT scan data. This directory should contain subdirectories
# case_00XXX where XXX is between 000 and 299
BASEDIR = "kits19_frames_1"
# The directory that contains the IR model files. Should contain unet44.xml and bin
# and quantized_unet44.xml and bin.
IR_PATH = "pretrained_model/unet44.xml"
COMPRESSED_MODEL_PATH = "pretrained_model/quantized_unet44.xml"

Download and Prepare Data

Download one validation video for live inference. We reuse the KitsDataset class that was also used in the training and quantization notebook that will be released later.

Data is expected in BASEDIR defined in the cell above. BASEDIR should contain directories named case_00000 to case_00299. If data for the case specified above does not exist yet, it will be downloaded and extracted in the next cell.

# The CT scan case number. For example: 16 for data from the case_00016 directory
# Currently only 16 is supported
case = 16

if not Path(f"{BASEDIR}/case_{case:05d}").exists():
    filename = download_file(
    with zipfile.ZipFile(filename, "r") as zip_ref:
    os.remove(filename)  # remove zipfile
    print(f"Downloaded and extracted data for case_{case:05d}")
    print(f"Data for case_{case:05d} exists")
Data for case_00016 exists
class KitsDataset(object):
    def __init__(self, basedir: str, dataset_type: str, transforms=None):
        Dataset class for prepared Kits19 data, for binary segmentation (background/kidney)

        :param basedir: Directory that contains the prepared CT scans, in subdirectories
                        case_00000 until case_00210
        :param dataset_type: either "train" or "val"
        :param transforms: Compose object with augmentations
        allmasks = sorted(glob.glob(f"{basedir}/case_*/segmentation_frames/*png"))

        if len(allmasks) == 0:
            raise ValueError(
                f"basedir: '{basedir}' does not contain data for type '{dataset_type}'"
        self.valpatients = [11, 15, 16, 49, 50, 79, 81, 89, 106, 108, 112, 126, 129, 133,
                            141, 166, 169, 170, 192, 202, 204]  # fmt: skip
        valcases = [f"case_{i:05d}" for i in self.valpatients]
        if dataset_type == "train":
            masks = [mask for mask in allmasks if Path(mask).parents[1].name not in valcases]
        elif dataset_type == "val":
            masks = [mask for mask in allmasks if Path(mask).parents[1].name in valcases]
            raise ValueError("Please choose train or val dataset split")

        if dataset_type == "train":
        self.basedir = basedir
        self.dataset_type = dataset_type
        self.dataset = masks
        self.transforms = transforms
            f"Created {dataset_type} dataset with {len(self.dataset)} items. Base directory for data: {basedir}"

    def __getitem__(self, index):
        Get an item from the dataset at the specified index.

        :return: (annotation, input_image, metadata) where annotation is (index, segmentation_mask)
                 and metadata a dictionary with case and slice number
        mask_path = self.dataset[index]
        # Open the image with OpenCV with `cv2.IMREAD_UNCHANGED` to prevent automatic
        # conversion of 1-channel black and white images to 3-channel BGR images.
        mask = cv2.imread(mask_path, cv2.IMREAD_UNCHANGED)

        image_path = str(Path(mask_path.replace("segmentation", "imaging")).with_suffix(".jpg"))
        img = cv2.imread(image_path, cv2.IMREAD_UNCHANGED)

        if img.shape[:2] != (512, 512):
            img = cv2.resize(img, (512, 512))
            mask = cv2.resize(mask, (512, 512))

        annotation = (index, mask.astype(np.uint8))
        input_image = np.expand_dims(img, axis=0).astype(np.float32)
        return (
            {"case": Path(mask_path).parents[1].name, "slice": Path(mask_path).stem},

    def __len__(self):
        return len(self.dataset)

# The sigmoid function is used to transform the result of the network
# to binary segmentation masks
def sigmoid(x):
    return np.exp(-np.logaddexp(0, -x))
# Create an instance of the KitsDataset class
# If you set dataset_type to train, make sure that `basedir` contains training data
dataset = KitsDataset(basedir=BASEDIR, dataset_type="val", transforms=None)
Created val dataset with 178 items. Base directory for data: kits19_frames_1

Load Model

num_images = 4
colormap = "gray"

ie = IECore()
net_ir = ie.read_network(IR_PATH)
net_pot = ie.read_network(COMPRESSED_MODEL_PATH)

exec_net_ir = ie.load_network(network=net_ir, device_name="CPU")
exec_net_pot = ie.load_network(network=net_pot, device_name="CPU")
input_layer = next(iter(net_ir.input_info))
output_layer_ir = next(iter(net_ir.outputs))
output_layer_pot = next(iter(net_pot.outputs))

Show Results

Visualize the results of the model on four slices of the validation set. Compare the results of the FP16 IR model with the results of the quantized INT8 model and the reference segmentation annotation.

Medical imaging datasets tend to be very imbalanced: most of the slices in a CT scan do not contain kidney data. The segmentation model should be good at finding kidneys where they exist (in medical terms: have good sensitivity) but also not find spurious kidneys that do not exist (have good specificity). In the next cell, we show four slices: two slices that have no kidney data, and two slices that contain kidney data. For this example, a slice has kidney data if at least 50 pixels in the slices are annotated as kidney.

Run this cell again to show results on a different subset. The random seed is displayed to allow reproducing specific runs of this cell.

Note: the images are shown after optional augmenting and resizing. In the Kits19 dataset all but one of the cases has input shape (512, 512).

# Create a dataset, and make a subset of the dataset for visualization
# The dataset items are (annotation, image) where annotation is (index, mask)
background_slices = (item for item in dataset if np.count_nonzero(item[0][1]) == 0)
kidney_slices = (item for item in dataset if np.count_nonzero(item[0][1]) > 50)
# Set seed to current time. To reproduce specific results, copy the printed seed
# and manually set `seed` to that value.
seed = int(time.time())
print(f"Visualizing results with seed {seed}")
data_subset = random.sample(list(background_slices), 2) + random.sample(list(kidney_slices), 2)

fig, ax = plt.subplots(nrows=num_images, ncols=4, figsize=(24, num_images * 4))
for i, (annotation, image, meta) in enumerate(data_subset):
    mask = annotation[1]
    res_ir = exec_net_ir.infer(inputs={input_layer: image})
    res_pot = exec_net_pot.infer(inputs={input_layer: image})
    target_mask = mask.astype(np.uint8)

    result_mask_ir = sigmoid(res_ir[output_layer_ir]).round().astype(np.uint8)[0, 0, ::]
    result_mask_pot = sigmoid(res_pot[output_layer_pot]).round().astype(np.uint8)[0, 0, ::]

    ax[i, 0].imshow(image[0, ::], cmap=colormap)
    ax[i, 1].imshow(target_mask, cmap=colormap)
    ax[i, 2].imshow(result_mask_ir, cmap=colormap)
    ax[i, 3].imshow(result_mask_pot, cmap=colormap)
    ax[i, 0].set_title(f"{meta['slice']}")
    ax[i, 1].set_title("Annotation")
    ax[i, 2].set_title("Prediction on FP16 model")
    ax[i, 3].set_title("Prediction on INT8 model")
Visualizing results with seed 1638286492

Compare Performance of the Original and Quantized Models

To measure the inference performance of the FP16 and INT8 models, we use Benchmark Tool, OpenVINO’s inference performance measurement tool. Benchmark tool is a command line application that can be run in the notebook with ! benchmark_app or %sx benchmark_app.

In this tutorial, we use a wrapper function from Notebook Utils. It prints the benchmark_app command with the chosen parameters.

NOTE: For the most accurate performance estimation, we recommended running benchmark_app in a terminal/command prompt after closing other applications. Run benchmark_app --help to see all command line options.

# By default, benchmark on MULTI:CPU,GPU if a GPU is available, otherwise on CPU.
device = "MULTI:CPU,GPU" if "GPU" in ie.available_devices else "CPU"
# Uncomment one of the options below to benchmark on other devices
# device = "GPU"
# device = "CPU"
# device = "AUTO"
# Benchmark FP16 model
benchmark_model(model_path=IR_PATH, device=device, seconds=15)

Benchmark unet44.xml with CPU for 15 seconds with async inference

Benchmark command: benchmark_app -m pretrained_model/unet44.xml -d CPU -t 15 -api async -b 1 -cdir model_cache

Count:      61 iterations
Duration:   15307.00 ms
Latency:    248.58 ms
Throughput: 3.99 FPS

Device: Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
# Benchmark INT8 model
benchmark_model(model_path=COMPRESSED_MODEL_PATH, device=device, seconds=15)

Benchmark quantized_unet44.xml with CPU for 15 seconds with async inference

Benchmark command: benchmark_app -m pretrained_model/quantized_unet44.xml -d CPU -t 15 -api async -b 1 -cdir model_cache

Count:      42 iterations
Duration:   15716.82 ms
Latency:    370.94 ms
Throughput: 2.67 FPS

Device: Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz

Show Live Inference

To show live inference on the model in the notebook, we use the asynchronous processing feature of OpenVINO Inference Engine.

If you use a GPU device, with device="GPU" or device="MULTI:CPU,GPU" to do inference on an integrated graphics card, model loading will be slow the first time you run this code. The model will be cached, so after the first time model loading will be fast. See the OpenVINO API tutorial for more information on Inference Engine, including Model Caching.

We define a helper function show_array to efficiently show images in the notebook. The do_inference function uses Open Model Zoo’s AsyncPipeline to perform asynchronous inference. After inference on the specified CT scan has completed, the total time and throughput (fps), including preprocessing and displaying, will be printed.

def showarray(frame: np.ndarray, display_handle=None):
    Display array `frame`. Replace information at `display_handle` with `frame`
    encoded as jpeg image

    Create a display_handle with: `display_handle = display(display_id=True)`
    _, frame = cv2.imencode(ext=".jpeg", img=frame)
    if display_handle is None:
        display_handle = display(Image(data=frame.tobytes()), display_id=True)
    return display_handle

def do_inference(imagelist: List, model: omz_model.Model, device: str):
    Do inference of images in `imagelist` on `model` on the given `device` and show
    the results in real time in a Jupyter Notebook

    :param imagelist: list of images/frames to do inference on
    :param model: Model instance for inference
    :param device: Name of device to perform inference on. For example: "CPU"
    display_handle = None
    next_frame_id = 0
    next_frame_id_to_show = 0

    input_layer = next(iter(

    # Create asynchronous pipeline and print time it takes to load the model
    load_start_time = time.perf_counter()
    pipeline = CTAsyncPipeline(
        ie=ie, model=model, plugin_config={}, device=device, max_num_requests=0
    load_end_time = time.perf_counter()

    # Perform asynchronous inference
    start_time = time.perf_counter()

    while next_frame_id < len(imagelist) - 1:
        results = pipeline.get_result(next_frame_id_to_show)

        if results:
            # Show next result from async pipeline
            result, meta = results
            display_handle = showarray(result, display_handle)

            next_frame_id_to_show += 1

        if pipeline.is_ready():
            # Submit new image to async pipeline
            image = imagelist[next_frame_id]
                inputs={input_layer: image}, id=next_frame_id, meta={"frame": image}
            next_frame_id += 1
            # If the pipeline is not ready yet and there are no results: wait


    # Show all frames that are in the pipeline after all images have been submitted
    while pipeline.has_completed_request():
        results = pipeline.get_result(next_frame_id_to_show)
        if results:
            result, meta = results
            display_handle = showarray(result, display_handle)
            next_frame_id_to_show += 1

    end_time = time.perf_counter()
    duration = end_time - start_time
    fps = len(imagelist) / duration
    print(f"Loaded model to {device} in {load_end_time-load_start_time:.2f} seconds.")
    print(f"Total time for {next_frame_id+1} frames: {duration:.2f} seconds, fps:{fps:.2f}")

Load the segmentation model with SegModel, based on the Open Model Zoo Model API. Load a CT scan from the BASEDIR directory (by default: kits19_frames) to a list.

ie = IECore()
segmentation_model = SegModel(ie=ie, model_path=Path(COMPRESSED_MODEL_PATH))
case = 16
demopattern = f"{BASEDIR}/case_{case:05d}/imaging_frames/*jpg"
imlist = sorted(glob.glob(demopattern))
images = [cv2.imread(im, cv2.IMREAD_UNCHANGED) for im in imlist]

In the next cell, we run the do inference function, which loads the model to the specified device (using caching for faster model loading on GPU devices), performs inference, and displays the results in real-time.

# Possible options for device include "CPU", "GPU", "AUTO", "MULTI"
device = "MULTI:CPU,GPU" if "GPU" in ie.available_devices else "CPU"
do_inference(imagelist=images, model=segmentation_model, device=device)
Loaded model to CPU in 0.20 seconds.
Total time for 178 frames: 70.34 seconds, fps:2.53