Convert to OpenVINO IR

IR (Intermediate Representation) is OpenVINO own format consisting of .xml and .bin files. Convert the model into OpenVINO IR for better performance.

Convert Models

Here are code examples of how to use these methods with different model formats:

  • The convert_model() method:

    This is the only method applicable to PyTorch models.

    List of supported formats:
    • Python objects:

      • torch.nn.Module

      • torch.jit.ScriptModule

      • torch.jit.ScriptFunction

    model = torchvision.models.resnet50(weights='DEFAULT')
    ov_model = convert_model(model)
    compiled_model = core.compile_model(ov_model, "AUTO")
    

    For more details on conversion, refer to the guide and an example tutorial on this topic.

  • The convert_model() method:

    When you use the convert_model() method, you have more control and you can specify additional adjustments for ov.Model. The read_model() and compile_model() methods are easier to use, however, they do not have such capabilities. With ov.Model you can choose to optimize, compile and run inference on it or serialize it into a file for subsequent use.

    List of supported formats:
    • Files:

      • SavedModel - <SAVED_MODEL_DIRECTORY> or <INPUT_MODEL>.pb

      • Checkpoint - <INFERENCE_GRAPH>.pb or <INFERENCE_GRAPH>.pbtxt

      • MetaGraph - <INPUT_META_GRAPH>.meta

    • Python objects:

      • tf.keras.Model

      • tf.keras.layers.Layer

      • tf.Module

      • tf.compat.v1.Graph

      • tf.compat.v1.GraphDef

      • tf.function

      • tf.compat.v1.session

      • tf.train.checkpoint

    ov_model = convert_model("saved_model.pb")
    compiled_model = core.compile_model(ov_model, "AUTO")
    

    For more details on conversion, refer to the guide and an example tutorial on this topic.

  • The read_model() and compile_model() methods:

    List of supported formats:
    • Files:

      • SavedModel - <SAVED_MODEL_DIRECTORY> or <INPUT_MODEL>.pb

      • Checkpoint - <INFERENCE_GRAPH>.pb or <INFERENCE_GRAPH>.pbtxt

      • MetaGraph - <INPUT_META_GRAPH>.meta

    ov_model = read_model("saved_model.pb")
    compiled_model = core.compile_model(ov_model, "AUTO")
    

    For a guide on how to run inference, see how to Integrate OpenVINO™ with Your Application. For TensorFlow format, see TensorFlow Frontend Capabilities and Limitations.

  • The compile_model() method:

    List of supported formats:
    • Files:

      • SavedModel - <SAVED_MODEL_DIRECTORY> or <INPUT_MODEL>.pb

      • Checkpoint - <INFERENCE_GRAPH>.pb or <INFERENCE_GRAPH>.pbtxt

      • MetaGraph - <INPUT_META_GRAPH>.meta

    ov::CompiledModel compiled_model = core.compile_model("saved_model.pb", "AUTO");
    

    For a guide on how to run inference, see how to Integrate OpenVINO™ with Your Application.

  • The compile_model() method:

    List of supported formats:
    • Files:

      • SavedModel - <SAVED_MODEL_DIRECTORY> or <INPUT_MODEL>.pb

      • Checkpoint - <INFERENCE_GRAPH>.pb or <INFERENCE_GRAPH>.pbtxt

      • MetaGraph - <INPUT_META_GRAPH>.meta

    ov_compiled_model_t* compiled_model = NULL;
    ov_core_compile_model_from_file(core, "saved_model.pb", "AUTO", 0, &compiled_model);
    

    For a guide on how to run inference, see how to Integrate OpenVINO™ with Your Application.

You can use ovc command-line tool to convert a model to IR. The obtained IR can then be read by read_model() and inferred.

ovc <INPUT_MODEL>.pb

For details on the conversion, refer to the article.

  • The convert_model() method:

    When you use the convert_model() method, you have more control and you can specify additional adjustments for ov.Model. The read_model() and compile_model() methods are easier to use, however, they do not have such capabilities. With ov.Model you can choose to optimize, compile and run inference on it or serialize it into a file for subsequent use.

    List of supported formats:
    • Files:

      • <INPUT_MODEL>.tflite

    ov_model = convert_model("<INPUT_MODEL>.tflite")
    compiled_model = core.compile_model(ov_model, "AUTO")
    

    For more details on conversion, refer to the guide and an example tutorial on this topic.

  • The read_model() method:

    List of supported formats:
    • Files:

      • <INPUT_MODEL>.tflite

    ov_model = read_model("<INPUT_MODEL>.tflite")
    compiled_model = core.compile_model(ov_model, "AUTO")
    
  • The compile_model() method:

    List of supported formats:
    • Files:

      • <INPUT_MODEL>.tflite

    compiled_model = core.compile_model("<INPUT_MODEL>.tflite", "AUTO")
    

    For a guide on how to run inference, see how to Integrate OpenVINO™ with Your Application.

  • The compile_model() method:

    List of supported formats:
    • Files:

      • <INPUT_MODEL>.tflite

    ov::CompiledModel compiled_model = core.compile_model("<INPUT_MODEL>.tflite", "AUTO");
    

    For a guide on how to run inference, see how to Integrate OpenVINO™ with Your Application.

  • The compile_model() method:

    List of supported formats:
    • Files:

      • <INPUT_MODEL>.tflite

    ov_compiled_model_t* compiled_model = NULL;
    ov_core_compile_model_from_file(core, "<INPUT_MODEL>.tflite", "AUTO", 0, &compiled_model);
    

    For a guide on how to run inference, see how to Integrate OpenVINO™ with Your Application.

  • The convert_model() method:

    You can use mo command-line tool to convert a model to IR. The obtained IR can then be read by read_model() and inferred.

    List of supported formats:
    • Files:

      • <INPUT_MODEL>.tflite

    ovc <INPUT_MODEL>.tflite
    

    For details on the conversion, refer to the article.

  • The convert_model() method:

    When you use the convert_model() method, you have more control and you can specify additional adjustments for ov.Model. The read_model() and compile_model() methods are easier to use, however, they do not have such capabilities. With ov.Model you can choose to optimize, compile and run inference on it or serialize it into a file for subsequent use.

    List of supported formats:
    • Files:

      • <INPUT_MODEL>.onnx

    ov_model = convert_model("<INPUT_MODEL>.onnx")
    compiled_model = core.compile_model(ov_model, "AUTO")
    

    For more details on conversion, refer to the guide and an example tutorial on this topic.

  • The read_model() method:

    List of supported formats:
    • Files:

      • <INPUT_MODEL>.onnx

    ov_model = read_model("<INPUT_MODEL>.onnx")
    compiled_model = core.compile_model(ov_model, "AUTO")
    
  • The compile_model() method:

    List of supported formats:
    • Files:

      • <INPUT_MODEL>.onnx

    compiled_model = core.compile_model("<INPUT_MODEL>.onnx", "AUTO")
    

    For a guide on how to run inference, see how to Integrate OpenVINO™ with Your Application.

  • The compile_model() method:

    List of supported formats:
    • Files:

      • <INPUT_MODEL>.onnx

    ov::CompiledModel compiled_model = core.compile_model("<INPUT_MODEL>.onnx", "AUTO");
    

    For a guide on how to run inference, see how to Integrate OpenVINO™ with Your Application.

  • The compile_model() method:

    List of supported formats:
    • Files:

      • <INPUT_MODEL>.onnx

    ov_compiled_model_t* compiled_model = NULL;
    ov_core_compile_model_from_file(core, "<INPUT_MODEL>.onnx", "AUTO", 0, &compiled_model);
    

    For details on the conversion, refer to the article

  • The convert_model() method:

    You can use mo command-line tool to convert a model to IR. The obtained IR can then be read by read_model() and inferred.

    List of supported formats:
    • Files:

      • <INPUT_MODEL>.onnx

    ovc <INPUT_MODEL>.onnx
    

    For details on the conversion, refer to the article

  • The convert_model() method:

    When you use the convert_model() method, you have more control and you can specify additional adjustments for ov.Model. The read_model() and compile_model() methods are easier to use, however, they do not have such capabilities. With ov.Model you can choose to optimize, compile and run inference on it or serialize it into a file for subsequent use.

    List of supported formats:
    • Files:

      • <INPUT_MODEL>.pdmodel

    • Python objects:

      • paddle.hapi.model.Model

      • paddle.fluid.dygraph.layers.Layer

      • paddle.fluid.executor.Executor

    ov_model = convert_model("<INPUT_MODEL>.pdmodel")
    compiled_model = core.compile_model(ov_model, "AUTO")
    

    For more details on conversion, refer to the guide and an example tutorial on this topic.

  • The read_model() method:

    List of supported formats:
    • Files:

      • <INPUT_MODEL>.pdmodel

    ov_model = read_model("<INPUT_MODEL>.pdmodel")
    compiled_model = core.compile_model(ov_model, "AUTO")
    
  • The compile_model() method:

    List of supported formats:
    • Files:

      • <INPUT_MODEL>.pdmodel

    compiled_model = core.compile_model("<INPUT_MODEL>.pdmodel", "AUTO")
    

    For a guide on how to run inference, see how to Integrate OpenVINO™ with Your Application.

  • The compile_model() method:

    List of supported formats:
    • Files:

      • <INPUT_MODEL>.pdmodel

    ov::CompiledModel compiled_model = core.compile_model("<INPUT_MODEL>.pdmodel", "AUTO");
    

    For a guide on how to run inference, see how to Integrate OpenVINO™ with Your Application.

  • The compile_model() method:

    List of supported formats:
    • Files:

      • <INPUT_MODEL>.pdmodel

    ov_compiled_model_t* compiled_model = NULL;
    ov_core_compile_model_from_file(core, "<INPUT_MODEL>.pdmodel", "AUTO", 0, &compiled_model);
    

    For a guide on how to run inference, see how to Integrate OpenVINO™ with Your Application.

  • The convert_model() method:

    You can use mo command-line tool to convert a model to IR. The obtained IR can then be read by read_model() and inferred.

    List of supported formats:
    • Files:

      • <INPUT_MODEL>.pdmodel

    ovc <INPUT_MODEL>.pdmodel
    

    For details on the conversion, refer to the article.

To choose the best workflow for your application, read the Model Preparation section.

Refer to the list of all supported conversion options in Conversion Parameters.

IR Conversion Benefits

Saving to IR to improve first inference latency
When first inference latency matters, rather than convert the framework model each time it is loaded, which may take some time depending on its size, it is better to do it once. Save the model as an OpenVINO IR with save_model and then load it with read_model as needed. This should improve the time it takes the model to make the first inference as it avoids the conversion step.
Saving to IR in FP16 to save space
Save storage space, even more so if FP16 is used as it may cut the size by about 50%, especially useful for large models, like Llama2-7B.
Saving to IR to avoid large dependencies in inference code
Frameworks such as TensorFlow and PyTorch tend to be large dependencies (multiple gigabytes), and not all inference environments have enough space to hold them.
Converting models to OpenVINO IR allows them to be used in an environment where OpenVINO is the only dependency, so much less disk space is needed.
Loading and compiling with OpenVINO directly usually takes less runtime memory than loading the model in the source framework and then converting and compiling it.

An example showing how to take advantage of OpenVINO IR, saving a model in OpenVINO IR once, using it many times, is shown below:

# Run once

import openvino as ov
import tensorflow as tf

# 1. Convert model created with TF code
model = tf.keras.applications.resnet50.ResNet50(weights="imagenet")
ov_model = ov.convert_model(model)

# 2. Save model as OpenVINO IR
ov.save_model(ov_model, 'model.xml', compress_to_fp16=True) # enabled by default

# Repeat as needed

import openvino as ov

# 3. Load model from file
core = ov.Core()
ov_model = core.read_model("model.xml")

# 4. Compile model from memory
compiled_model = core.compile_model(ov_model)