Convert to OpenVINO IR¶

IR (Intermediate Representation) is OpenVINO own format consisting of .xml and .bin files. Convert the model into OpenVINO IR for better performance.

Convert Models¶

Here are code examples of how to use these methods with different model formats:

PyTorch

Python

The convert_model() method:

This is the only method applicable to PyTorch models.
List of supported formats:
- Python objects:
  - torch.nn.Module
  - torch.jit.ScriptModule
  - torch.jit.ScriptFunction
```
import torchvision
import openvino as ov

model = torchvision.models.resnet50(weights='DEFAULT')
ov_model = ov.convert_model(model)
compiled_model = ov.compile_model(ov_model, "AUTO")
```
For more details on conversion, refer to the guide and an example tutorial on this topic.

TensorFlow

Python

The convert_model() method:

When you use the convert_model() method, you have more control and you can specify additional adjustments for ov.Model. The read_model() and compile_model() methods are easier to use, however, they do not have such capabilities. With ov.Model you can choose to optimize, compile and run inference on it or serialize it into a file for subsequent use.
List of supported formats:
- Files:
  - SavedModel - <SAVED_MODEL_DIRECTORY> or <INPUT_MODEL>.pb
  - Checkpoint - <INFERENCE_GRAPH>.pb or <INFERENCE_GRAPH>.pbtxt
  - MetaGraph - <INPUT_META_GRAPH>.meta
- Python objects:
  - tf.keras.Model
  - tf.keras.layers.Layer
  - tf.Module
  - tf.compat.v1.Graph
  - tf.compat.v1.GraphDef
  - tf.function
  - tf.compat.v1.session
  - tf.train.checkpoint
```
import openvino as ov

ov_model = ov.convert_model("saved_model.pb")
compiled_model = ov.compile_model(ov_model, "AUTO")
```
For more details on conversion, refer to the guide and an example tutorial on this topic.
The read_model() and compile_model() methods:
List of supported formats:
- Files:
  - SavedModel - <SAVED_MODEL_DIRECTORY> or <INPUT_MODEL>.pb
  - Checkpoint - <INFERENCE_GRAPH>.pb or <INFERENCE_GRAPH>.pbtxt
  - MetaGraph - <INPUT_META_GRAPH>.meta
```
import openvino as ov

core = ov.Core()
ov_model = core.read_model("saved_model.pb")
compiled_model = ov.compile_model(ov_model, "AUTO")
```
For a guide on how to run inference, see how to Integrate OpenVINO™ with Your Application. For TensorFlow format, see TensorFlow Frontend Capabilities and Limitations.

C++

The compile_model() method:
List of supported formats:
- Files:
  - SavedModel - <SAVED_MODEL_DIRECTORY> or <INPUT_MODEL>.pb
  - Checkpoint - <INFERENCE_GRAPH>.pb or <INFERENCE_GRAPH>.pbtxt
  - MetaGraph - <INPUT_META_GRAPH>.meta
```
ov::CompiledModel compiled_model = core.compile_model("saved_model.pb", "AUTO");
```
For a guide on how to run inference, see how to Integrate OpenVINO™ with Your Application.

The compile_model() method:
List of supported formats:
- Files:
  - SavedModel - <SAVED_MODEL_DIRECTORY> or <INPUT_MODEL>.pb
  - Checkpoint - <INFERENCE_GRAPH>.pb or <INFERENCE_GRAPH>.pbtxt
  - MetaGraph - <INPUT_META_GRAPH>.meta
```
ov_compiled_model_t* compiled_model = NULL;
ov_core_compile_model_from_file(core, "saved_model.pb", "AUTO", 0, &compiled_model);
```
For a guide on how to run inference, see how to Integrate OpenVINO™ with Your Application.

CLI

You can use ovc command-line tool to convert a model to IR. The obtained IR can then be read by read_model() and inferred.

ovc <INPUT_MODEL>.pb

For details on the conversion, refer to the article.

TensorFlow Lite

Python

The convert_model() method:

When you use the convert_model() method, you have more control and you can specify additional adjustments for ov.Model. The read_model() and compile_model() methods are easier to use, however, they do not have such capabilities. With ov.Model you can choose to optimize, compile and run inference on it or serialize it into a file for subsequent use.
List of supported formats:
- Files:
  - <INPUT_MODEL>.tflite
```
import openvino as ov

ov_model = ov.convert_model("<INPUT_MODEL>.tflite")
compiled_model = ov.compile_model(ov_model, "AUTO")
```
For more details on conversion, refer to the guide and an example tutorial on this topic.

The read_model() method:

import openvino as ov

core = ov.Core()
ov_model = core.read_model("<INPUT_MODEL>.tflite")
compiled_model = ov.compile_model(ov_model, "AUTO")

The compile_model() method:
List of supported formats:
- Files:
  - <INPUT_MODEL>.tflite
```
import openvino as ov

compiled_model = ov.compile_model("<INPUT_MODEL>.tflite", "AUTO")
```
For a guide on how to run inference, see how to Integrate OpenVINO™ with Your Application.

C++

The compile_model() method:
List of supported formats:
- Files:
  - <INPUT_MODEL>.tflite
```
ov::CompiledModel compiled_model = core.compile_model("<INPUT_MODEL>.tflite", "AUTO");
```
For a guide on how to run inference, see how to Integrate OpenVINO™ with Your Application.

The compile_model() method:
List of supported formats:
- Files:
  - <INPUT_MODEL>.tflite
```
ov_compiled_model_t* compiled_model = NULL;
ov_core_compile_model_from_file(core, "<INPUT_MODEL>.tflite", "AUTO", 0, &compiled_model);
```
For a guide on how to run inference, see how to Integrate OpenVINO™ with Your Application.

CLI

The convert_model() method:

You can use mo command-line tool to convert a model to IR. The obtained IR can then be read by read_model() and inferred.
List of supported formats:
- Files:
  - <INPUT_MODEL>.tflite
```
ovc <INPUT_MODEL>.tflite
```
For details on the conversion, refer to the article.

ONNX

Python

The convert_model() method:

When you use the convert_model() method, you have more control and you can specify additional adjustments for ov.Model. The read_model() and compile_model() methods are easier to use, however, they do not have such capabilities. With ov.Model you can choose to optimize, compile and run inference on it or serialize it into a file for subsequent use.
List of supported formats:
- Files:
  - <INPUT_MODEL>.onnx
```
import openvino as ov

ov_model = ov.convert_model("<INPUT_MODEL>.onnx")
compiled_model = ov.compile_model(ov_model, "AUTO")
```
For more details on conversion, refer to the guide and an example tutorial on this topic.

The read_model() method:

import openvino as ov

core = ov.Core()
ov_model = core.read_model("<INPUT_MODEL>.onnx")
compiled_model = ov.compile_model(ov_model, "AUTO")

The compile_model() method:
List of supported formats:
- Files:
  - <INPUT_MODEL>.onnx
```
import openvino as ov

compiled_model = ov.compile_model("<INPUT_MODEL>.onnx", "AUTO")
```
For a guide on how to run inference, see how to Integrate OpenVINO™ with Your Application.

C++

The compile_model() method:
List of supported formats:
- Files:
  - <INPUT_MODEL>.onnx
```
ov::CompiledModel compiled_model = core.compile_model("<INPUT_MODEL>.onnx", "AUTO");
```
For a guide on how to run inference, see how to Integrate OpenVINO™ with Your Application.

The compile_model() method:

ov_compiled_model_t* compiled_model = NULL;
ov_core_compile_model_from_file(core, "<INPUT_MODEL>.onnx", "AUTO", 0, &compiled_model);

For details on the conversion, refer to the article

CLI

The convert_model() method:

You can use mo command-line tool to convert a model to IR. The obtained IR can then be read by read_model() and inferred.
List of supported formats:
- Files:
  - <INPUT_MODEL>.onnx
```
ovc <INPUT_MODEL>.onnx
```
For details on the conversion, refer to the article

PaddlePaddle

Python

The convert_model() method:

When you use the convert_model() method, you have more control and you can specify additional adjustments for ov.Model. The read_model() and compile_model() methods are easier to use, however, they do not have such capabilities. With ov.Model you can choose to optimize, compile and run inference on it or serialize it into a file for subsequent use.
List of supported formats:
- Files:
  - <INPUT_MODEL>.pdmodel
- Python objects:
  - paddle.hapi.model.Model
  - paddle.fluid.dygraph.layers.Layer
  - paddle.fluid.executor.Executor
```
import openvino as ov

ov_model = ov.convert_model("<INPUT_MODEL>.pdmodel")
compiled_model = ov.compile_model(ov_model, "AUTO")
```
For more details on conversion, refer to the guide and an example tutorial on this topic.

The read_model() method:

import openvino as ov

core = ov.Core()
ov_model = core.read_model("<INPUT_MODEL>.pdmodel")
compiled_model = ov.compile_model(ov_model, "AUTO")

The compile_model() method:
List of supported formats:
- Files:
  - <INPUT_MODEL>.pdmodel
```
import openvino as ov

compiled_model = ov.compile_model("<INPUT_MODEL>.pdmodel", "AUTO")
```
For a guide on how to run inference, see how to Integrate OpenVINO™ with Your Application.

C++

The compile_model() method:
List of supported formats:
- Files:
  - <INPUT_MODEL>.pdmodel
```
ov::CompiledModel compiled_model = core.compile_model("<INPUT_MODEL>.pdmodel", "AUTO");
```
For a guide on how to run inference, see how to Integrate OpenVINO™ with Your Application.

The compile_model() method:
List of supported formats:
- Files:
  - <INPUT_MODEL>.pdmodel
```
ov_compiled_model_t* compiled_model = NULL;
ov_core_compile_model_from_file(core, "<INPUT_MODEL>.pdmodel", "AUTO", 0, &compiled_model);
```
For a guide on how to run inference, see how to Integrate OpenVINO™ with Your Application.

CLI

The convert_model() method:

You can use mo command-line tool to convert a model to IR. The obtained IR can then be read by read_model() and inferred.
List of supported formats:
- Files:
  - <INPUT_MODEL>.pdmodel
```
ovc <INPUT_MODEL>.pdmodel
```
For details on the conversion, refer to the article.

To choose the best workflow for your application, read the Model Preparation section.

Refer to the list of all supported conversion options in Conversion Parameters.

IR Conversion Benefits¶

Saving to IR to improve first inference latency

When first inference latency matters, rather than convert the framework model each time it is loaded, which may take some time depending on its size, it is better to do it once. Save the model as an OpenVINO IR with save_model and then load it with read_model as needed. This should improve the time it takes the model to make the first inference as it avoids the conversion step.

Saving to IR in FP16 to save space

Save storage space, even more so if FP16 is used as it may cut the size by about 50%, especially useful for large models, like Llama2-7B.

Saving to IR to avoid large dependencies in inference code

Frameworks such as TensorFlow and PyTorch tend to be large dependencies (multiple gigabytes), and not all inference environments have enough space to hold them.
Converting models to OpenVINO IR allows them to be used in an environment where OpenVINO is the only dependency, so much less disk space is needed.
Loading and compiling with OpenVINO directly usually takes less runtime memory than loading the model in the source framework and then converting and compiling it.

An example showing how to take advantage of OpenVINO IR, saving a model in OpenVINO IR once, using it many times, is shown below:

# Run once

import openvino as ov
import tensorflow as tf

# 1. Convert model created with TF code
model = tf.keras.applications.resnet50.ResNet50(weights="imagenet")
ov_model = ov.convert_model(model)

# 2. Save model as OpenVINO IR
ov.save_model(ov_model, 'model.xml', compress_to_fp16=True) # enabled by default

# Repeat as needed

import openvino as ov

# 3. Load model from file
core = ov.Core()
ov_model = core.read_model("model.xml")

# 4. Compile model from memory
compiled_model = ov.compile_model(ov_model)

Additional Resources¶

Transition guide from the legacy to new conversion API