OpenVINO Workflow#

OpenVINO offers multiple workflows, depending on the use case and personal or project preferences. This section will give you a detailed view of how you can go from preparing your model, through optimizing it, to executing inference, and deploying your solution.

Once you obtain a model in one of the supported model formats, you can decide how to proceed:

This approach assumes you run your model directly.

OpenVINO workflow diagram for convenience

This approach assumes you convert your model to OpenVINO IR explicitly, which means the conversion stage is not part of the final application.

OpenVINO workflow diagram for performance

OpenVINO uses the following functions for reading, converting, and saving models:

  • Creates an ov.Model from a file.

  • Supported file formats: OpenVINO IR, ONNX, PaddlePaddle, TensorFlow and TensorFlow Lite. PyTorch files are not directly supported.

  • OpenVINO files are read directly while other formats are converted automatically.

  • Creates an ov.CompiledModel from a file or ov.Model object.

  • Supported file formats: OpenVINO IR, ONNX, PaddlePaddle, TensorFlow and TensorFlow Lite. PyTorch files are not directly supported.

  • OpenVINO files are read directly while other formats are converted automatically.

  • Creates an ov.Model from a file or Python memory object.

  • Supported file formats: ONNX, PaddlePaddle, TensorFlow and TensorFlow Lite.

  • Supported framework objects: PaddlePaddle, TensorFlow and PyTorch.

  • This method is only available in the Python API.

  • Saves an ov.Model to OpenVINO IR format.

  • Compresses weights to FP16 by default.

  • This method is only available in the Python API.

Learn how to convert pre-trained models to OpenVINO IR.
Find out how to optimize a model to achieve better inference performance, utilizing multiple optimization methods for both in-training compression and post-training quantization.
See how to run inference with OpenVINO, which is the most basic form of deployment, and the quickest way of running a deep learning model.
Deploy a model locally, reading the file directly from your application and utilizing about-openvino/additional-resources available to the system.
Deployment on a local system uses the steps described in the section on running inference.
Deploy a model remotely, connecting your application to an inference server and utilizing external about-openvino/additional-resources, with no impact on the app’s performance.
Deployment on OpenVINO Model Server is quick and does not require any additional steps described in the section on running inference.
Deploy a PyTorch model using OpenVINO in a PyTorch-native application.