Inference Engine enables deploying your network model trained with any of supported deep learning frameworks: Caffe*, TensorFlow*, Kaldi*, MXNet* or converted to the ONNX* format. To perform the inference, the Inference Engine does not operate with the original model, but with its Intermediate Representation (IR), which is optimized for execution on end-point target devices. To generate an IR for your trained model, the Model Optimizer tool is used.
Model Optimizer loads a model into memory, reads it, builds the internal representation of the model, optimizes it, and produces the Intermediate Representation. Intermediate Representation is the only format the Inference Engine accepts.
NOTE: Model Optimizer does not infer models. Model Optimizer is an offline tool that runs before the inference takes place.
Model Optimizer has two main purposes:
.xml
and .bin
) that form the Intermediate Representation.Dropout
layer. These layers are useless during inference and might increase the inference time. In many cases, these operations can be automatically removed from the resulting Intermediate Representation. However, if a group of operations can be represented as a single mathematical operation, and thus as a single operation node in a model graph, the Model Optimizer recognizes such patterns and replaces this group of operation nodes with the only one operation. The result is an Intermediate Representation that has fewer operation nodes than the original model. This decreases the inference time.To produce a valid Intermediate Representation, the Model Optimizer must be able to read the original model operations, handle their properties and represent them in Intermediate Representation format, while maintaining validity of the resulting Intermediate Representation. The resulting model consists of operations described in the Operations Specification.
Many common layers exist across known frameworks and neural network topologies. Examples of these layers are Convolution
, Pooling
, and Activation
. To read the original model and produce the Intermediate Representation of a model, the Model Optimizer must be able to work with these layers.
The full list of them depends on the framework and can be found in the Supported Framework Layers section. If your topology contains only layers from the list of layers, as is the case for the topologies used by most users, the Model Optimizer easily creates the Intermediate Representation. After that you can proceed to work with the Inference Engine.
However, if you use a topology with layers that are not recognized by the Model Optimizer out of the box, see Custom Layers in the Model Optimizer to learn how to work with custom layers.
After installation with OpenVINO™ toolkit or Intel® Deep Learning Deployment Toolkit, the Model Optimizer folder has the following structure:
The following sections provide the information about how to use the Model Optimizer, from configuring the tool and generating an IR for a given model to customizing the tool for your needs: