Kidney Segmentation with PyTorch Lightning and OpenVINO™ - Part 3¶
This tutorial is a part of a series on how to train, optimize, quantize
and show live inference on a medical segmentation model. The goal is to
accelerate inference on a kidney segmentation model. The
UNet model is trained from
scratch; the data is from
Kits19.
This notebook needs a trained UNet model. We provide a pre-trained
model, trained for 20 epochs with the full
Kits-19 frames dataset, which
has an F1 score on the validation set of 0.9. The training code is
available in this notebook.
NNCF for PyTorch models requires a C++ compiler. On Windows, install
Microsoft Visual Studio
2019.
During installation, choose Desktop development with C++ in the
Workloads tab. On macOS, run xcode-select–install from a Terminal.
On Linux, install gcc.
Running this notebook with the full dataset will take a long time. For
demonstration purposes, this tutorial will download one converted CT
scan and use that scan for quantization and inference. For production
purposes, use a representative dataset for quantizing the model.
# On Windows, try to find the directory that contains x64 cl.exe and add it to the PATH to enable PyTorch# to find the required C++ tools. This code assumes that Visual Studio is installed in the default# directory. If you have a different C++ compiler, please add the correct path to os.environ["PATH"]# directly. Note that the C++ Redistributable is not enough to run this notebook.# Adding the path to os.environ["LIB"] is not always required - it depends on the system's configurationimportsysifsys.platform=="win32":importdistutils.command.build_extimportosfrompathlibimportPathifsys.getwindowsversion().build>=20000:# Windows 11search_path="**/Hostx64/x64/cl.exe"else:search_path="**/Hostx86/x64/cl.exe"VS_INSTALL_DIR_2019=r"C:/Program Files (x86)/Microsoft Visual Studio"VS_INSTALL_DIR_2022=r"C:/Program Files/Microsoft Visual Studio"cl_paths_2019=sorted(list(Path(VS_INSTALL_DIR_2019).glob(search_path)))cl_paths_2022=sorted(list(Path(VS_INSTALL_DIR_2022).glob(search_path)))cl_paths=cl_paths_2019+cl_paths_2022iflen(cl_paths)==0:raiseValueError("Cannot find Visual Studio. This notebook requires an x64 C++ compiler. If you installed ""a C++ compiler, please add the directory that contains cl.exe to `os.environ['PATH']`.")else:# If multiple versions of MSVC are installed, get the most recent versioncl_path=cl_paths[-1]vs_dir=str(cl_path.parent)os.environ["PATH"]+=f"{os.pathsep}{vs_dir}"# Code for finding the library dirs from# https://stackoverflow.com/questions/47423246/get-pythons-lib-pathd=distutils.core.Distribution()b=distutils.command.build_ext.build_ext(d)b.finalize_options()os.environ["LIB"]=os.pathsep.join(b.library_dirs)print(f"Added {vs_dir} to PATH")
importloggingimportosimportrandomimportsysimporttimeimportwarningsimportzipfilefrompathlibimportPathfromtypingimportUnionwarnings.filterwarnings("ignore",category=UserWarning)importcv2importmatplotlib.pyplotaspltimportmonaiimportnumpyasnpimporttorchimportnncfimportopenvinoasovfrommonai.transformsimportLoadImagefromnncf.common.logging.loggerimportset_log_levelfromtorchmetricsimportF1ScoreasF1set_log_level(logging.ERROR)# Disables all NNCF info and warning messagesfromcustom_segmentationimportSegmentationModelfromasync_pipelineimportshow_live_inferencesys.path.append("../utils")fromnotebook_utilsimportdownload_file
2024-02-09 22:51:11.599112: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
2024-02-09 22:51:11.634091: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
By default, this notebook will download one CT scan from the KITS19
dataset that will be used for quantization. To use the full dataset, set
BASEDIR to the path of the dataset, as prepared according to the
Data Preparation notebook.
BASEDIR=Path("kits19_frames_1")# Uncomment the line below to use the full dataset, as prepared in the data preparation notebook# BASEDIR = Path("~/kits19/kits19_frames").expanduser()MODEL_DIR=Path("model")MODEL_DIR.mkdir(exist_ok=True)
Download the pre-trained model weights, load the PyTorch model and the
state_dict that was saved after training. The model used in this
notebook is a
BasicUNet
model from MONAI. We provide a pre-trained
checkpoint. To see how this model performs, check out the training
notebook.
# The CT scan case number. For example: 2 for data from the case_00002 directory# Currently only 117 is supportedCASE=117ifnot(BASEDIR/f"case_{CASE:05d}").exists():BASEDIR.mkdir(exist_ok=True)filename=download_file(f"https://storage.openvinotoolkit.org/data/test_data/openvino_notebooks/kits19/case_{CASE:05d}.zip")withzipfile.ZipFile(filename,"r")aszip_ref:zip_ref.extractall(path=BASEDIR)os.remove(filename)# remove zipfileprint(f"Downloaded and extracted data for case_{CASE:05d}")else:print(f"Data for case_{CASE:05d} exists")
The KitsDataset class in the next cell expects images and masks in
the ``basedir`` directory, in a folder per patient. It is a simplified
version of the Dataset class in the training
notebook.
Images are loaded with MONAI’s
LoadImage,
to align with the image loading method in the training notebook. This
method rotates and flips the images. We define a rotate_and_flip
method to display the images in the expected orientation:
defrotate_and_flip(image):"""Rotate `image` by 90 degrees and flip horizontally"""returncv2.flip(cv2.rotate(image,rotateCode=cv2.ROTATE_90_CLOCKWISE),flipCode=1)classKitsDataset:def__init__(self,basedir:str):""" Dataset class for prepared Kits19 data, for binary segmentation (background/kidney) Source data should exist in basedir, in subdirectories case_00000 until case_00210, with each subdirectory containing directories imaging_frames, with jpg images, and segmentation_frames with segmentation masks as png files. See https://github.com/openvinotoolkit/openvino_notebooks/blob/main/notebooks/110-ct-segmentation-quantize/data-preparation-ct-scan.ipynb :param basedir: Directory that contains the prepared CT scans """masks=sorted(BASEDIR.glob("case_*/segmentation_frames/*png"))self.basedir=basedirself.dataset=masksprint(f"Created dataset with {len(self.dataset)} items. "f"Base directory for data: {basedir}")def__getitem__(self,index):""" Get an item from the dataset at the specified index. :return: (image, segmentation_mask) """mask_path=self.dataset[index]image_path=str(mask_path.with_suffix(".jpg")).replace("segmentation_frames","imaging_frames")# Load images with MONAI's LoadImage to match data loading in training notebookmask=LoadImage(image_only=True,dtype=np.uint8)(str(mask_path)).numpy()img=LoadImage(image_only=True,dtype=np.float32)(str(image_path)).numpy()ifimg.shape[:2]!=(512,512):img=cv2.resize(img.astype(np.uint8),(512,512)).astype(np.float32)mask=cv2.resize(mask,(512,512))input_image=np.expand_dims(img,axis=0)returninput_image,maskdef__len__(self):returnlen(self.dataset)
To test whether the data loader returns the expected output, we show an
image and a mask. The image and the mask are returned by the dataloader,
after resizing and preprocessing. Since this dataset contains a lot of
slices without kidneys, we select a slice that contains at least 5000
kidney pixels to verify that the annotations look correct:
dataset=KitsDataset(BASEDIR)# Find a slice that contains kidney annotations# item[0] is the annotation: (id, annotation_data)image_data,mask=next(itemforitemindatasetifnp.count_nonzero(item[1])>5000)# Remove extra image dimension and rotate and flip the image for visualizationimage=rotate_and_flip(image_data.squeeze())# The data loader returns annotations as (index, mask) and mask in shape (H,W)mask=rotate_and_flip(mask)fig,ax=plt.subplots(1,2,figsize=(12,6))ax[0].imshow(image,cmap="gray")ax[1].imshow(mask,cmap="gray");
Define a metric to determine the performance of the model.
For this demo, we use the F1
score, or Dice coefficient,
from the
TorchMetrics
library.
defcompute_f1(model:Union[torch.nn.Module,ov.CompiledModel],dataset:KitsDataset):""" Compute binary F1 score of `model` on `dataset` F1 score metric is provided by the torchmetrics library `model` is expected to be a binary segmentation model, images in the dataset are expected in (N,C,H,W) format where N==C==1 """metric=F1(ignore_index=0,task="binary",average="macro")withtorch.no_grad():forimage,targetindataset:input_image=torch.as_tensor(image).unsqueeze(0)ifisinstance(model,ov.CompiledModel):output_layer=model.output(0)output=model(input_image)[output_layer]output=torch.from_numpy(output)else:output=model(input_image)label=torch.as_tensor(target.squeeze()).long()prediction=torch.sigmoid(output.squeeze()).round().long()metric.update(label.flatten(),prediction.flatten())returnmetric.compute()
/opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/monai/networks/nets/basic_unet.py:179:TracerWarning:ConvertingatensortoaPythonbooleanmightcausethetracetobeincorrect.Wecan't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!ifx_e.shape[-i-1]!=x_0.shape[-i-1]:
NNCF provides a suite of
advanced algorithms for Neural Networks inference optimization in
OpenVINO with minimal accuracy drop.
NOTE: NNCF Post-training Quantization is available in OpenVINO
2023.0 release.
Create a quantized model from the pre-trained FP32 model and the
calibration dataset. The optimization process contains the following
steps:
1. Create a Dataset for quantization.
2. Run `nncf.quantize` for getting an optimized model.
3. Export the quantized model to ONNX and then convert to OpenVINO IR model.
4. Serialize the INT8 model using `ov.save_model` function for benchmarking.
deftransform_fn(data_item):""" Extract the model's input from the data item. The data item here is the data item that is returned from the data source per iteration. This function should be passed when the data item cannot be used as model's input. """images,_=data_itemreturnimagesdata_loader=torch.utils.data.DataLoader(dataset)calibration_dataset=nncf.Dataset(data_loader,transform_fn)quantized_model=nncf.quantize(model,calibration_dataset,# Do not quantize LeakyReLU activations to allow the INT8 model to run on Intel GPUignored_scope=nncf.IgnoredScope(patterns=[".*LeakyReLU.*"]))
Output()
Output()
Export the quantized model to ONNX and then convert it to OpenVINO IR
model and save it.
/opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/nncf/torch/quantization/layers.py:334:TracerWarning:ConvertingatensortoaPythonnumbermightcausethetracetobeincorrect.Wecan't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!returnself._level_low.item()/opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/nncf/torch/quantization/layers.py:342:TracerWarning:ConvertingatensortoaPythonnumbermightcausethetracetobeincorrect.Wecan't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!returnself._level_high.item()/opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/monai/networks/nets/basic_unet.py:179:TracerWarning:ConvertingatensortoaPythonbooleanmightcausethetracetobeincorrect.Wecan't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!ifx_e.shape[-i-1]!=x_0.shape[-i-1]:
This notebook demonstrates post-training quantization with NNCF.
NNCF also supports quantization-aware training, and other algorithms
than quantization. See the NNCF
documentation in the NNCF
repository for more information.
fp32_ir_model_size=fp32_ir_path.with_suffix(".bin").stat().st_size/1024quantized_model_size=int8_ir_path.with_suffix(".bin").stat().st_size/1024print(f"FP32 IR model size: {fp32_ir_model_size:.2f} KB")print(f"INT8 model size: {quantized_model_size:.2f} KB")
To measure the inference performance of the FP32 and INT8
models, we use Benchmark
Tool
- OpenVINO’s inference performance measurement tool. Benchmark tool is a
command line application, part of OpenVINO development tools, that can
be run in the notebook with !benchmark_app or
%sxbenchmark_app.
NOTE: For the most accurate performance estimation, it is
recommended to run benchmark_app in a terminal/command prompt
after closing other applications. Run
benchmark_app-mmodel.xml-dCPU to benchmark async inference on
CPU for one minute. Change CPU to GPU to benchmark on GPU.
Run benchmark_app--help to see all command line options.
[ INFO ] CPU
[ INFO ] Build ................................. 2023.3.0-13775-ceeafaf64f3-releases/2023/3
[ INFO ]
[ INFO ]
[Step 3/11] Setting device configuration
[ WARNING ] Performance hint was not explicitly specified in command line. Device(CPU) performance hint will be set to PerformanceMode.LATENCY.
[Step 4/11] Reading model files
[ INFO ] Loading model files
[ INFO ] Read model took 26.51 ms
[ INFO ] Original model I/O parameters:
[ INFO ] Model inputs:
[ INFO ] x (node: x) : f32 / [...] / [?,?,?,?]
[ INFO ] Model outputs:
[ INFO ] *NO_NAME* (node: __module.final_conv/aten::_convolution/Add) : f32 / [...] / [?,1,16..,16..]
[Step 5/11] Resizing model to match image sizes and given batch
[ INFO ] Model batch size: 1
[Step 6/11] Configuring input of the model
[ INFO ] Model inputs:
[ INFO ] x (node: x) : f32 / [...] / [?,?,?,?]
[ INFO ] Model outputs:
[ INFO ] *NO_NAME* (node: __module.final_conv/aten::_convolution/Add) : f32 / [...] / [?,1,16..,16..]
[Step 7/11] Loading the model to the device
[ INFO ] Compile model took 87.61 ms
[Step 8/11] Querying optimal runtime parameters
[ INFO ] Model:
[ INFO ] NETWORK_NAME: Model0
[ INFO ] OPTIMAL_NUMBER_OF_INFER_REQUESTS: 1
[ INFO ] NUM_STREAMS: 1
[ INFO ] AFFINITY: Affinity.CORE
[ INFO ] INFERENCE_NUM_THREADS: 12
[ INFO ] PERF_COUNT: NO
[ INFO ] INFERENCE_PRECISION_HINT: <Type: 'float32'>
[ INFO ] PERFORMANCE_HINT: LATENCY
[ INFO ] EXECUTION_MODE_HINT: ExecutionMode.PERFORMANCE
[ INFO ] PERFORMANCE_HINT_NUM_REQUESTS: 0
[ INFO ] ENABLE_CPU_PINNING: True
[ INFO ] SCHEDULING_CORE_TYPE: SchedulingCoreType.ANY_CORE
[ INFO ] ENABLE_HYPER_THREADING: False
[ INFO ] EXECUTION_DEVICES: ['CPU']
[ INFO ] CPU_DENORMALS_OPTIMIZATION: False
[ INFO ] CPU_SPARSE_WEIGHTS_DECOMPRESSION_RATE: 1.0
[Step 9/11] Creating infer requests and preparing input tensors
[ ERROR ] Input x is dynamic. Provide data shapes!
Traceback (most recent call last):
File "/opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/openvino/tools/benchmark/main.py", line 486, in main
data_queue = get_input_data(paths_to_input, app_inputs_info)
File "/opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-609/.workspace/scm/ov-notebook/.venv/lib/python3.8/site-packages/openvino/tools/benchmark/utils/inputs_filling.py", line 123, in get_input_data
raise Exception(f"Input {info.name} is dynamic. Provide data shapes!")
Exception: Input x is dynamic. Provide data shapes!
[ INFO ] Compile model took 190.56 ms
[Step 8/11] Querying optimal runtime parameters
[ INFO ] Model:
[ INFO ] NETWORK_NAME: main_graph
[ INFO ] OPTIMAL_NUMBER_OF_INFER_REQUESTS: 1
[ INFO ] NUM_STREAMS: 1
[ INFO ] AFFINITY: Affinity.CORE
[ INFO ] INFERENCE_NUM_THREADS: 12
[ INFO ] PERF_COUNT: NO
[ INFO ] INFERENCE_PRECISION_HINT: <Type: 'float32'>
[ INFO ] PERFORMANCE_HINT: LATENCY
[ INFO ] EXECUTION_MODE_HINT: ExecutionMode.PERFORMANCE
[ INFO ] PERFORMANCE_HINT_NUM_REQUESTS: 0
[ INFO ] ENABLE_CPU_PINNING: True
[ INFO ] SCHEDULING_CORE_TYPE: SchedulingCoreType.ANY_CORE
[ INFO ] ENABLE_HYPER_THREADING: False
[ INFO ] EXECUTION_DEVICES: ['CPU']
[ INFO ] CPU_DENORMALS_OPTIMIZATION: False
[ INFO ] CPU_SPARSE_WEIGHTS_DECOMPRESSION_RATE: 1.0
[Step 9/11] Creating infer requests and preparing input tensors
[ WARNING ] No input files were given for input 'x.1'!. This input will be filled with random values!
[ INFO ] Fill input 'x.1' with random values
[Step 10/11] Measuring performance (Start inference synchronously, limits: 15000 ms duration)
[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).
Visualize the results of the model on four slices of the validation set.
Compare the results of the FP32 IR model with the results of the
quantized INT8 model and the reference segmentation annotation.
Medical imaging datasets tend to be very imbalanced: most of the slices
in a CT scan do not contain kidney data. The segmentation model should
be good at finding kidneys where they exist (in medical terms: have good
sensitivity) but also not find spurious kidneys that do not exist (have
good specificity). In the next cell, there are four slices: two slices
that have no kidney data, and two slices that contain kidney data. For
this example, a slice has kidney data if at least 50 pixels in the
slices are annotated as kidney.
Run this cell again to show results on a different subset. The random
seed is displayed to enable reproducing specific runs of this cell.
NOTE: the images are shown after optional augmenting and
resizing. In the Kits19 dataset all but one of the cases has the
(512,512) input shape.
# The sigmoid function is used to transform the result of the network# to binary segmentation masksdefsigmoid(x):returnnp.exp(-np.logaddexp(0,-x))num_images=4colormap="gray"# Load FP32 and INT8 modelscore=ov.Core()fp_model=core.read_model(fp32_ir_path)int8_model=core.read_model(int8_ir_path)compiled_model_fp=core.compile_model(fp_model,device_name="CPU")compiled_model_int8=core.compile_model(int8_model,device_name="CPU")output_layer_fp=compiled_model_fp.output(0)output_layer_int8=compiled_model_int8.output(0)# Create subset of datasetbackground_slices=(itemforitemindatasetifnp.count_nonzero(item[1])==0)kidney_slices=(itemforitemindatasetifnp.count_nonzero(item[1])>50)data_subset=random.sample(list(background_slices),2)+random.sample(list(kidney_slices),2)# Set seed to current time. To reproduce specific results, copy the printed seed# and manually set `seed` to that value.seed=int(time.time())random.seed(seed)print(f"Visualizing results with seed {seed}")fig,ax=plt.subplots(nrows=num_images,ncols=4,figsize=(24,num_images*4))fori,(image,mask)inenumerate(data_subset):display_image=rotate_and_flip(image.squeeze())target_mask=rotate_and_flip(mask).astype(np.uint8)# Add batch dimension to image and do inference on FP and INT8 modelsinput_image=np.expand_dims(image,0)res_fp=compiled_model_fp([input_image])res_int8=compiled_model_int8([input_image])# Process inference outputs and convert to binary segementation masksresult_mask_fp=sigmoid(res_fp[output_layer_fp]).squeeze().round().astype(np.uint8)result_mask_int8=sigmoid(res_int8[output_layer_int8]).squeeze().round().astype(np.uint8)result_mask_fp=rotate_and_flip(result_mask_fp)result_mask_int8=rotate_and_flip(result_mask_int8)# Display images, annotations, FP32 result and INT8 resultax[i,0].imshow(display_image,cmap=colormap)ax[i,1].imshow(target_mask,cmap=colormap)ax[i,2].imshow(result_mask_fp,cmap=colormap)ax[i,3].imshow(result_mask_int8,cmap=colormap)ax[i,2].set_title("Prediction on FP32 model")ax[i,3].set_title("Prediction on INT8 model")
To show live inference on the model in the notebook, we will use the
asynchronous processing feature of OpenVINO.
We use the show_live_inference function from Notebook
Utils to show live inference. This
function uses Open Model
Zoo’s Async
Pipeline and Model API to perform asynchronous inference. After
inference on the specified CT scan has completed, the total time and
throughput (fps), including preprocessing and displaying, will be
printed.
NOTE: If you experience flickering on Firefox, consider using
Chrome or Edge to run this notebook.
We load the segmentation model to OpenVINO Runtime with
SegmentationModel, based on the Open Model
Zoo Model API.
This model implementation includes pre and post processing for the
model. For SegmentationModel, this includes the code to create an
overlay of the segmentation mask on the original image/frame.
In the next cell, we run the show_live_inference function, which
loads the segmentation_model to the specified device (using
caching for faster model loading on GPU devices), loads the images,
performs inference, and displays the results on the frames loaded in
images in real-time.
# Possible options for device include "CPU", "GPU", "AUTO", "MULTI:CPU,GPU"device="CPU"reader=LoadImage(image_only=True,dtype=np.uint8)show_live_inference(ie=core,image_paths=image_paths,model=segmentation_model,device=device,reader=reader)