Key Features#

Easy Integration#

Use deep learning models from PyTorch, TensorFlow, TensorFlow Lite, PaddlePaddle, and ONNX directly or convert them to the optimized OpenVINO IR format for improved performance.

Close integration with PyTorch

For PyTorch-based applications, specify OpenVINO as a backend using torch.compile to improve model inference. Apply OpenVINO optimizations to your PyTorch models directly with a single line of code.

GenAI Out Of The Box

With the OpenVINO GenAI, you can run generative models with just a few lines of code. Check out the GenAI guide for instructions on how to do it.

Python / C++ / C / NodeJS APIs

OpenVINO offers the C++ API as a complete set of available methods. For less resource-critical solutions, the Python API provides almost full coverage, while C and NodeJS ones are limited to the methods most basic for their typical environments. The NodeJS API, is still in its early and active development.

Open source and easy to extend

If you need a particular feature or inference accelerator to be supported, you are free to file a feature request or develop new components specific to your projects yourself. As open source, OpenVINO may be used and modified freely. See the extensibility guide for more information on how to adapt it to your needs.

Deployment#

Local or remote

Integrate the OpenVINO runtime directly with your application to run inference locally or use OpenVINO Model Server to shift the inference workload to a remote system, a separate server or a Kubernetes environment. For serving, OpenVINO is also integrated with vLLM and Triton services.

Scalable and portable

Write an application once, deploy it anywhere, always making the most out of your hardware setup. The automatic device selection mode gives you the ultimate deployment flexibility on all major operating systems. Check out system requirements.

Light-weight

Designed with minimal external dependencies, OpenVINO does not bloat your application and simplifies installation and dependency management. The custom compilation for your specific model(s) may further reduce the final binary size.

Performance#

Model Optimization

Optimize your deep learning models with NNCF, using various training-time and post-training compression methods, such as pruning, sparsity, quantization, and weight compression. Make your models take less space, run faster, and use less resources.

Top performance

OpenVINO is optimized to work with Intel hardware, delivering confirmed high performance for hundreds of models. Explore OpenVINO Performance Benchmarks to discover the optimal hardware configurations and plan your AI deployment based on verified data.

Enhanced App Start-Up Time

If you need your application to launch immediately, OpenVINO will reduce first-inference latency, running inference on CPU until a more suited device is ready to take over. Once a model is compiled for inference, it is also cached, improving the start-up time even more.