Convert to OpenVINO IR#

OpenVINO IR is the proprietary model format used by OpenVINO, typically obtained by converting models of supported frameworks:

PyTorch

Python

The convert_model() method:

This is the only method applicable to PyTorch models.
List of supported formats:
- Python objects:
  - torch.nn.Module
  - torch.jit.ScriptModule
  - torch.jit.ScriptFunction
  - torch.export.ExportedProgram
```
import torchvision
import openvino as ov

model = torchvision.models.resnet50(weights='DEFAULT')
ov_model = ov.convert_model(model)
compiled_model = ov.compile_model(ov_model, "AUTO")
```
For more details on conversion, refer to the guide and an example tutorial on this topic.

TensorFlow

Python

The convert_model() method:

When you use the convert_model() method, you have more control and you can specify additional adjustments for ov.Model. The read_model() and compile_model() methods are easier to use, however, they do not have such capabilities. With ov.Model you can choose to optimize, compile and run inference on it or serialize it into a file for subsequent use.
List of supported formats:
- Files:
  - SavedModel - <SAVED_MODEL_DIRECTORY> or <INPUT_MODEL>.pb
  - Checkpoint - <INFERENCE_GRAPH>.pb or <INFERENCE_GRAPH>.pbtxt
  - MetaGraph - <INPUT_META_GRAPH>.meta
- Python objects:
  - tf.keras.Model
  - tf.keras.layers.Layer
  - tf.Module
  - tf.compat.v1.Graph
  - tf.compat.v1.GraphDef
  - tf.function
  - tf.compat.v1.session
  - tf.train.checkpoint
```
import openvino as ov

ov_model = ov.convert_model("saved_model.pb")
compiled_model = ov.compile_model(ov_model, "AUTO")
```
For more details on conversion, refer to the guide and an example tutorial on this topic.
The read_model() and compile_model() methods:
List of supported formats:
- Files:
  - SavedModel - <SAVED_MODEL_DIRECTORY> or <INPUT_MODEL>.pb
  - Checkpoint - <INFERENCE_GRAPH>.pb or <INFERENCE_GRAPH>.pbtxt
  - MetaGraph - <INPUT_META_GRAPH>.meta
```
import openvino as ov

core = ov.Core()
ov_model = core.read_model("saved_model.pb")
compiled_model = ov.compile_model(ov_model, "AUTO")
```
For a guide on how to run inference, see how to Integrate OpenVINO™ with Your Application.

C++

The compile_model() method:
List of supported formats:
- Files:
  - SavedModel - <SAVED_MODEL_DIRECTORY> or <INPUT_MODEL>.pb
  - Checkpoint - <INFERENCE_GRAPH>.pb or <INFERENCE_GRAPH>.pbtxt
  - MetaGraph - <INPUT_META_GRAPH>.meta
```
ov::CompiledModel compiled_model = core.compile_model("saved_model.pb", "AUTO");
```
For a guide on how to run inference, see how to Integrate OpenVINO™ with Your Application.

The compile_model() method:
List of supported formats:
- Files:
  - SavedModel - <SAVED_MODEL_DIRECTORY> or <INPUT_MODEL>.pb
  - Checkpoint - <INFERENCE_GRAPH>.pb or <INFERENCE_GRAPH>.pbtxt
  - MetaGraph - <INPUT_META_GRAPH>.meta
```
ov_compiled_model_t* compiled_model = NULL;
ov_core_compile_model_from_file(core, "saved_model.pb", "AUTO", 0, &compiled_model);
```
For a guide on how to run inference, see how to Integrate OpenVINO™ with Your Application.

CLI

You can use ovc command-line tool to convert a model to IR. The obtained IR can then be read by read_model() and inferred.

ovc <INPUT_MODEL>.pb

For details on the conversion, refer to the article.

TensorFlow Lite

Python

The convert_model() method:

When you use the convert_model() method, you have more control and you can specify additional adjustments for ov.Model. The read_model() and compile_model() methods are easier to use, however, they do not have such capabilities. With ov.Model you can choose to optimize, compile and run inference on it or serialize it into a file for subsequent use.
List of supported formats:
- Files:
  - <INPUT_MODEL>.tflite
```
import openvino as ov

ov_model = ov.convert_model("<INPUT_MODEL>.tflite")
compiled_model = ov.compile_model(ov_model, "AUTO")
```
For more details on conversion, refer to the guide and an example tutorial on this topic.
The read_model() method:
List of supported formats:
- Files:
  - <INPUT_MODEL>.tflite
```
import openvino as ov

core = ov.Core()
ov_model = core.read_model("<INPUT_MODEL>.tflite")
compiled_model = ov.compile_model(ov_model, "AUTO")
```
The compile_model() method:
List of supported formats:
- Files:
  - <INPUT_MODEL>.tflite
```
import openvino as ov

compiled_model = ov.compile_model("<INPUT_MODEL>.tflite", "AUTO")
```
For a guide on how to run inference, see how to Integrate OpenVINO™ with Your Application.

C++

The compile_model() method:
List of supported formats:
- Files:
  - <INPUT_MODEL>.tflite
```
ov::CompiledModel compiled_model = core.compile_model("<INPUT_MODEL>.tflite", "AUTO");
```
For a guide on how to run inference, see how to Integrate OpenVINO™ with Your Application.

The compile_model() method:
List of supported formats:
- Files:
  - <INPUT_MODEL>.tflite
```
ov_compiled_model_t* compiled_model = NULL;
ov_core_compile_model_from_file(core, "<INPUT_MODEL>.tflite", "AUTO", 0, &compiled_model);
```
For a guide on how to run inference, see how to Integrate OpenVINO™ with Your Application.

CLI

The convert_model() method:

You can use mo command-line tool to convert a model to IR. The obtained IR can then be read by read_model() and inferred.
List of supported formats:
- Files:
  - <INPUT_MODEL>.tflite
```
ovc <INPUT_MODEL>.tflite
```
For details on the conversion, refer to the article.

ONNX

Python

The convert_model() method:

When you use the convert_model() method, you have more control and you can specify additional adjustments for ov.Model. The read_model() and compile_model() methods are easier to use, however, they do not have such capabilities. With ov.Model you can choose to optimize, compile and run inference on it or serialize it into a file for subsequent use.
List of supported formats:
- Files:
  - <INPUT_MODEL>.onnx
```
import openvino as ov

ov_model = ov.convert_model("<INPUT_MODEL>.onnx")
compiled_model = ov.compile_model(ov_model, "AUTO")
```
For more details on conversion, refer to the guide and an example tutorial on this topic.
The read_model() method:
List of supported formats:
- Files:
  - <INPUT_MODEL>.onnx
```
import openvino as ov

core = ov.Core()
ov_model = core.read_model("<INPUT_MODEL>.onnx")
compiled_model = ov.compile_model(ov_model, "AUTO")
```
The compile_model() method:
List of supported formats:
- Files:
  - <INPUT_MODEL>.onnx
```
import openvino as ov

compiled_model = ov.compile_model("<INPUT_MODEL>.onnx", "AUTO")
```
For a guide on how to run inference, see how to Integrate OpenVINO™ with Your Application.

C++

The compile_model() method:
List of supported formats:
- Files:
  - <INPUT_MODEL>.onnx
```
ov::CompiledModel compiled_model = core.compile_model("<INPUT_MODEL>.onnx", "AUTO");
```
For a guide on how to run inference, see how to Integrate OpenVINO™ with Your Application.

The compile_model() method:
List of supported formats:
- Files:
  - <INPUT_MODEL>.onnx
```
ov_compiled_model_t* compiled_model = NULL;
ov_core_compile_model_from_file(core, "<INPUT_MODEL>.onnx", "AUTO", 0, &compiled_model);
```
For details on the conversion, refer to the article

CLI

The convert_model() method:

You can use mo command-line tool to convert a model to IR. The obtained IR can then be read by read_model() and inferred.
List of supported formats:
- Files:
  - <INPUT_MODEL>.onnx
```
ovc <INPUT_MODEL>.onnx
```
For details on the conversion, refer to the article

PaddlePaddle

Python

The convert_model() method:

When you use the convert_model() method, you have more control and you can specify additional adjustments for ov.Model. The read_model() and compile_model() methods are easier to use, however, they do not have such capabilities. With ov.Model you can choose to optimize, compile and run inference on it or serialize it into a file for subsequent use.
List of supported formats:
- Files:
  - <INPUT_MODEL>.pdmodel
- Python objects:
  - paddle.hapi.model.Model
  - paddle.fluid.dygraph.layers.Layer
  - paddle.fluid.executor.Executor
```
import openvino as ov

ov_model = ov.convert_model("<INPUT_MODEL>.pdmodel")
compiled_model = ov.compile_model(ov_model, "AUTO")
```
For more details on conversion, refer to the guide and an example tutorial on this topic.
The read_model() method:
List of supported formats:
- Files:
  - <INPUT_MODEL>.pdmodel
```
import openvino as ov

core = ov.Core()
ov_model = core.read_model("<INPUT_MODEL>.pdmodel")
compiled_model = ov.compile_model(ov_model, "AUTO")
```
The compile_model() method:
List of supported formats:
- Files:
  - <INPUT_MODEL>.pdmodel
```
import openvino as ov

compiled_model = ov.compile_model("<INPUT_MODEL>.pdmodel", "AUTO")
```
For a guide on how to run inference, see how to Integrate OpenVINO™ with Your Application.

C++

The compile_model() method:
List of supported formats:
- Files:
  - <INPUT_MODEL>.pdmodel
```
ov::CompiledModel compiled_model = core.compile_model("<INPUT_MODEL>.pdmodel", "AUTO");
```
For a guide on how to run inference, see how to Integrate OpenVINO™ with Your Application.

The compile_model() method:
List of supported formats:
- Files:
  - <INPUT_MODEL>.pdmodel
```
ov_compiled_model_t* compiled_model = NULL;
ov_core_compile_model_from_file(core, "<INPUT_MODEL>.pdmodel", "AUTO", 0, &compiled_model);
```
For a guide on how to run inference, see how to Integrate OpenVINO™ with Your Application.

CLI

The convert_model() method:

You can use mo command-line tool to convert a model to IR. The obtained IR can then be read by read_model() and inferred.
List of supported formats:
- Files:
  - <INPUT_MODEL>.pdmodel
```
ovc <INPUT_MODEL>.pdmodel
```
For details on the conversion, refer to the article.

JAX/Flax

Python

The convert_model() method is the only method applicable to JAX/Flax models.

Conversion of the jax._src.core.ClosedJaxpr object
```
import jax
import jax.numpy as jnp
import openvino as ov

# let user have some JAX function
def jax_func(x, y):
    return jax.lax.tanh(jax.lax.max(x, y))

# use example inputs for creation of ClosedJaxpr object
x = jnp.array([1.0, 2.0])
y = jnp.array([-1.0, 10.0])
jaxpr = jax.make_jaxpr(jax_func)(x, y)

ov_model = ov.convert_model(jaxpr)
compiled_model = ov.compile_model(ov_model, "AUTO")
```

Conversion of the flax.linen.Module object

import flax.linen as nn
import jax
import jax.numpy as jnp
import openvino as ov

# let user have some Flax module
class SimpleDenseModule(nn.Module):
    features: int

    @nn.compact
    def __call__(self, x):
        return nn.Dense(features=self.features)(x)

module = SimpleDenseModule(features=4)

# create example_input used in training
example_input = jnp.ones((2, 3))

# prepare parameters to initialize the module
# they can be also loaded from a disk
# using pickle, flax.serialization for deserialization
key = jax.random.PRNGKey(0)
params = module.init(key, example_input)
module = module.bind(params)

ov_model = ov.convert_model(module, example_input=example_input)
compiled_model = ov.compile_model(ov_model, "AUTO")

For more details on conversion, refer to the conversion guide.

These are basic examples, for detailed conversion instructions, see the individual guides on PyTorch, ONNX, TensorFlow, TensorFlow Lite, PaddlePaddle, and JAX/Flax.

Refer to the list of all supported conversion options in Conversion Parameters.

IR Conversion Benefits#

Saving to IR to improve first inference latency

When first inference latency matters, rather than convert the framework model each time it is loaded, which may take some time depending on its size, it is better to do it once. Save the model as an OpenVINO IR with save_model and then load it with read_model as needed. This should improve the time it takes the model to make the first inference as it avoids the conversion step.

Saving to IR in FP16 to save space

Save storage space, even more so if FP16 is used as it may cut the size by about 50%, especially useful for large models, like Llama2-7B.

Saving to IR to avoid large dependencies in inference code

Frameworks such as TensorFlow, PyTorch, and JAX/Flax tend to be large dependencies for applications running inference (multiple gigabytes). Converting models to OpenVINO IR removes this dependency, as OpenVINO can run its inference with no additional components. This way, much less disk space is needed, while loading and compiling usually takes less runtime memory than loading the model in the source framework and then converting and compiling it.

Here is an example of how to benefit from OpenVINO IR, saving a model once and running it multiple times:

# Run once

import openvino as ov
import tensorflow as tf

# 1. Convert model created with TF code
model = tf.keras.applications.resnet50.ResNet50(weights="imagenet")
ov_model = ov.convert_model(model)

# 2. Save model as OpenVINO IR
ov.save_model(ov_model, 'model.xml', compress_to_fp16=True) # enabled by default

# Repeat as needed

import openvino as ov

# 3. Load model from file
core = ov.Core()
ov_model = core.read_model("model.xml")

# 4. Compile model from memory
compiled_model = ov.compile_model(ov_model)

Convert to OpenVINO IR#

IR Conversion Benefits#

Additional Resources#