OpenVINO IR format

The models, built and trained using various frameworks, can be large and architecture-dependent. To successfully run inference from any device and maximize the benefits of OpenVINO tools, you can convert the model to the OpenVINO Intermediate Representation (IR) format.

OpenVINO IR is the proprietary model format of OpenVINO. It is produced after converting a model with model conversion API. Model conversion API translates the frequently used deep learning operations to their respective similar representation in OpenVINO and tunes them with the associated weights and biases from the trained model. The resulting IR contains two files:

  • .xml - Describes the model topology.

  • .bin - Contains the weights and binary data.

IR Structure

OpenVINO toolkit introduces its own format of graph representation and its own operation set. A graph is represented with two files: an XML file and a binary file. This representation is commonly referred to as the Intermediate Representation or IR.

The XML file describes a model topology using a <layer> tag for an operation node and an <edge> tag for a data-flow connection. Each operation has a fixed number of attributes that define operation flavor used for a node. For example, the Convolution operation has such attributes as dilation, stride, pads_begin, and pads_end.

The XML file does not have big constant values like convolution weights. Instead, it refers to a part of the accompanying binary file that stores such values in a binary format.

Here is an example of a small IR XML file that corresponds to a graph from the previous section:

<?xml version="1.0" ?>
<net name="model_file_name" version="10">
   <layers>
      <layer id="0" name="input" type="Parameter" version="opset1">
            <data element_type="f32" shape="1,3,32,100"/>
            <output>

               <port id="0" precision="FP32">
                  <dim>1</dim>
                  <dim>3</dim>
                  <dim>32</dim>
                  <dim>100</dim>
               </port>
            </output>
      </layer>
      <layer id="1" name="conv1/weights" type="Const" version="opset1">

            <data element_type="f32" offset="0" shape="64,3,3,3" size="6912"/>
            <output>
               <port id="1" precision="FP32">
                  <dim>64</dim>
                  <dim>3</dim>
                  <dim>3</dim>
                  <dim>3</dim>
               </port>
            </output>
      </layer>
      <layer id="2" name="conv1" type="Convolution" version="opset1">
            <data auto_pad="same_upper" dilations="1,1" output_padding="0,0" pads_begin="1,1" pads_end="1,1" strides="1,1"/>
            <input>
               <port id="0">
                  <dim>1</dim>
                  <dim>3</dim>
                  <dim>32</dim>
                  <dim>100</dim>
               </port>
               <port id="1">
                  <dim>64</dim>
                  <dim>3</dim>
                  <dim>3</dim>
                  <dim>3</dim>
               </port>
            </input>
            <output>
               <port id="2" precision="FP32">
                  <dim>1</dim>
                  <dim>64</dim>
                  <dim>32</dim>
                  <dim>100</dim>
               </port>
            </output>
      </layer>
      <layer id="3" name="conv1/activation" type="ReLU" version="opset1">
            <input>
               <port id="0">
                  <dim>1</dim>
                  <dim>64</dim>
                  <dim>32</dim>
                  <dim>100</dim>
               </port>
            </input>
            <output>
               <port id="1" precision="FP32">
                  <dim>1</dim>
                  <dim>64</dim>
                  <dim>32</dim>
                  <dim>100</dim>
               </port>
            </output>
      </layer>
      <layer id="4" name="output" type="Result" version="opset1">
            <input>
               <port id="0">
                  <dim>1</dim>
                  <dim>64</dim>
                  <dim>32</dim>
                  <dim>100</dim>
               </port>
            </input>
      </layer>
   </layers>
   <edges>

      <edge from-layer="0" from-port="0" to-layer="2" to-port="0"/>
      <edge from-layer="1" from-port="1" to-layer="2" to-port="1"/>
      <edge from-layer="2" from-port="2" to-layer="3" to-port="0"/>
      <edge from-layer="3" from-port="1" to-layer="4" to-port="0"/>
   </edges>
   <meta_data>

      <MO_version value="2022.3"/>
      <cli_parameters>
            <blobs_as_inputs value="True"/>
            <caffe_parser_path value="DIR"/>
            <data_type value="float"/>

            ...



      </cli_parameters>
   </meta_data>
</net>

The IR does not use explicit data nodes described in the previous section. In contrast, properties of data such as tensor dimensions and their data types are described as properties of input and output ports of operations.