Convert a Tensorflow Lite Model to OpenVINO™¶

This tutorial is also available as a Jupyter notebook that can be cloned directly from GitHub. See the installation guide for instructions to run this tutorial locally on Windows, Linux or macOS. To run without installing anything, click the “Open in Colab” button.

TensorFlow Lite, often referred to as TFLite, is an open source library developed for deploying machine learning models to edge devices.

This short tutorial shows how to convert a TensorFlow Lite EfficientNet-Lite-B0 image classification model to OpenVINO Intermediate Representation (OpenVINO IR) format, using Model Optimizer. After creating the OpenVINO IR, load the model in OpenVINO Runtime and do inference with a sample image.

Table of contents:

Preparation
- Install requirements
- Imports
Download TFLite model
Convert a Model to OpenVINO IR Format
Load model using OpenVINO TensorFlow Lite Frontend
Run OpenVINO model inference
- Select inference device
Estimate Model Performance

Preparation ⇑¶

Install requirements ⇑¶

!pip install -q "openvino-dev>=2023.0.0"
!pip install -q opencv-python requests tqdm

# Fetch `notebook_utils` module
import urllib.request
urllib.request.urlretrieve(
    url='https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/main/notebooks/utils/notebook_utils.py',
    filename='notebook_utils.py'
);

Imports ⇑¶

from pathlib import Path
import numpy as np
from PIL import Image
from openvino.runtime import Core, serialize
from openvino.tools import mo

from notebook_utils import download_file, load_image

Download TFLite model ⇑¶

model_dir = Path("model")
tflite_model_path = model_dir / "efficientnet_lite0_fp32_2.tflite"

ov_model_path = tflite_model_path.with_suffix(".xml")
model_url = "https://tfhub.dev/tensorflow/lite-model/efficientnet/lite0/fp32/2?lite-format=tflite"

download_file(model_url, tflite_model_path.name, model_dir)

model/efficientnet_lite0_fp32_2.tflite:   0%|          | 0.00/17.7M [00:00<?, ?B/s]

PosixPath('/opt/home/k8sworker/ci-ai/cibuilds/ov-notebook/OVNotebookOps-475/.workspace/scm/ov-notebook/notebooks/119-tflite-to-openvino/model/efficientnet_lite0_fp32_2.tflite')

Convert a Model to OpenVINO IR Format ⇑¶

To convert the TFLite model to OpenVINO IR, model conversion Python API can be used. mo.convert_model function accepts the path to the TFLite model and returns an OpenVINO Model class instance which represents this model. The obtained model is ready to use and to be loaded on a device using compile_model or can be saved on a disk using serialize function, reducing loading time for next running. Optionally, we can apply compression to the FP16 model weights, using the compress_to_fp16=True option and integrate preprocessing using this approach. For more information about model conversion, see this page. For TensorFlow Lite models support, refer to this tutorial.

ov_model = mo.convert_model(tflite_model_path, compress_to_fp16=True)
serialize(ov_model, ov_model_path)
print(f"Model {tflite_model_path} successfully converted and saved to {ov_model_path}")

Model model/efficientnet_lite0_fp32_2.tflite successfully converted and saved to model/efficientnet_lite0_fp32_2.xml

Load model using OpenVINO TensorFlow Lite Frontend ⇑¶

TensorFlow Lite models are supported via FrontEnd API. You may skip conversion to IR and read models directly by OpenVINO runtime API. For more examples supported formats reading via Frontend API, please look this tutorial.

core = Core()

ov_model = core.read_model(tflite_model_path)

Run OpenVINO model inference ⇑¶

We can find information about model input preprocessing in its description on TensorFlow Hub.

image = load_image("https://storage.openvinotoolkit.org/repositories/openvino_notebooks/data/data/image/coco_bricks.png")
# load_image reads the image in BGR format, [:,:,::-1] reshape transfroms it to RGB
image = Image.fromarray(image[:,:,::-1])
resized_image = image.resize((224, 224))
input_tensor = np.expand_dims((np.array(resized_image).astype(np.float32) - 127) / 128, 0)

Select inference device ⇑¶

Select device from dropdown list for running inference using OpenVINO:

import ipywidgets as widgets

device = widgets.Dropdown(
    options=core.available_devices + ["AUTO"],
    value='AUTO',
    description='Device:',
    disabled=False,
)

device

Dropdown(description='Device:', index=1, options=('CPU', 'AUTO'), value='AUTO')

compiled_model = core.compile_model(ov_model)
predicted_scores = compiled_model(input_tensor)[0]

imagenet_classes_file_path = download_file("https://storage.openvinotoolkit.org/repositories/openvino_notebooks/data/data/datasets/imagenet/imagenet_2012.txt")
imagenet_classes = open(imagenet_classes_file_path).read().splitlines()

top1_predicted_cls_id = np.argmax(predicted_scores)
top1_predicted_score = predicted_scores[0][top1_predicted_cls_id]
predicted_label = imagenet_classes[top1_predicted_cls_id]

display(image.resize((640, 512)))
print(f"Predicted label: {predicted_label} with probability {top1_predicted_score :2f}")

imagenet_2012.txt:   0%|          | 0.00/30.9k [00:00<?, ?B/s]

../_images/119-tflite-to-openvino-with-output_16_1.png

Predicted label: n02109047 Great Dane with probability 0.715318

Estimate Model Performance ⇑¶

Benchmark Tool is used to measure the inference performance of the model on CPU and GPU.

Note

For more accurate performance, it is recommended to run benchmark_app in a terminal/command prompt after closing other applications. Run benchmark_app -m model.xml -d CPU to benchmark async inference on CPU for one minute. Change CPU to GPU to benchmark on GPU. Run benchmark_app --help to see an overview of all command-line options.

print("Benchmark model inference on CPU")
!benchmark_app -m $ov_model_path -d CPU -t 15
if "GPU" in core.available_devices:
    print("\n\nBenchmark model inference on GPU")
    !benchmark_app -m $ov_model_path -d GPU -t 15

Benchmark model inference on CPU
[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[Step 2/11] Loading OpenVINO Runtime
[ INFO ] OpenVINO:
[ INFO ] Build ................................. 2023.0.0-10926-b4452d56304-releases/2023/0
[ INFO ]
[ INFO ] Device info:
[ INFO ] CPU
[ INFO ] Build ................................. 2023.0.0-10926-b4452d56304-releases/2023/0
[ INFO ]
[ INFO ]
[Step 3/11] Setting device configuration
[ WARNING ] Performance hint was not explicitly specified in command line. Device(CPU) performance hint will be set to PerformanceMode.THROUGHPUT.
[Step 4/11] Reading model files
[ INFO ] Loading model files
[ INFO ] Read model took 9.14 ms
[ INFO ] Original model I/O parameters:
[ INFO ] Model inputs:
[ INFO ]     images (node: images) : f32 / [...] / [1,224,224,3]
[ INFO ] Model outputs:
[ INFO ]     Softmax (node: 61) : f32 / [...] / [1,1000]
[Step 5/11] Resizing model to match image sizes and given batch
[ INFO ] Model batch size: 1
[Step 6/11] Configuring input of the model
[ INFO ] Model inputs:
[ INFO ]     images (node: images) : u8 / [N,H,W,C] / [1,224,224,3]
[ INFO ] Model outputs:
[ INFO ]     Softmax (node: 61) : f32 / [...] / [1,1000]
[Step 7/11] Loading the model to the device
[ INFO ] Compile model took 151.57 ms
[Step 8/11] Querying optimal runtime parameters
[ INFO ] Model:
[ INFO ]   NETWORK_NAME: TensorFlow_Lite_Frontend_IR
[ INFO ]   OPTIMAL_NUMBER_OF_INFER_REQUESTS: 6
[ INFO ]   NUM_STREAMS: 6
[ INFO ]   AFFINITY: Affinity.CORE
[ INFO ]   INFERENCE_NUM_THREADS: 24
[ INFO ]   PERF_COUNT: False
[ INFO ]   INFERENCE_PRECISION_HINT: <Type: 'float32'>
[ INFO ]   PERFORMANCE_HINT: PerformanceMode.THROUGHPUT
[ INFO ]   EXECUTION_MODE_HINT: ExecutionMode.PERFORMANCE
[ INFO ]   PERFORMANCE_HINT_NUM_REQUESTS: 0
[ INFO ]   ENABLE_CPU_PINNING: True
[ INFO ]   SCHEDULING_CORE_TYPE: SchedulingCoreType.ANY_CORE
[ INFO ]   ENABLE_HYPER_THREADING: True
[ INFO ]   EXECUTION_DEVICES: ['CPU']
[Step 9/11] Creating infer requests and preparing input tensors
[ WARNING ] No input files were given for input 'images'!. This input will be filled with random values!
[ INFO ] Fill input 'images' with random values
[Step 10/11] Measuring performance (Start inference asynchronously, 6 inference requests, limits: 15000 ms duration)
[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).
[ INFO ] First inference took 7.60 ms
[Step 11/11] Dumping statistics report
[ INFO ] Execution Devices:['CPU']
[ INFO ] Count:            17526 iterations
[ INFO ] Duration:         15005.75 ms
[ INFO ] Latency:
[ INFO ]    Median:        5.00 ms
[ INFO ]    Average:       5.00 ms
[ INFO ]    Min:           3.28 ms
[ INFO ]    Max:           14.83 ms
[ INFO ] Throughput:   1167.95 FPS