OpenVINO 2025.4#

OpenVINO GenAI

Simplify GenAI model deployment!
Check out our guide
OpenVINO models on Hugging Face!

Get pre-optimized OpenVINO models, no need to convert!
Visit Hugging Face
OpenVINO Model Hub

See performance benchmarks for top AI models!
Explore now
OpenVINO via PyTorch 2.0 torch.compile()

Use OpenVINO directly in PyTorch-native applications!
Learn more

OpenVINO is an open-source toolkit for deploying performant AI solutions in the cloud, on-prem, and on the edge alike. Develop your applications with both generative and conventional AI models, coming from the most popular model frameworks. Convert, optimize, and run inference utilizing the full potential of Intel® hardware. There are three main tools in OpenVINO to meet all your deployment needs:

OpenVINO GenAI

Run and deploy generative AI models

./openvino-workflow-generative.html

OpenVINO Base Package

Run and deploy conventional AI models

./openvino-workflow.html

OpenVINO Model Server

Deploy both generative and conventional AI inference on a server

./model-server/ovms_what_is_openvino_model_server.html

For a quick ramp-up, check out the
OpenVINO Toolkit Cheat Sheet [PDF]
and the
OpenVINO GenAI Quick-start Guide [PDF]

Where to Begin#

Installation

This guide introduces installation and learning materials for Intel® Distribution of OpenVINO™ toolkit.

Get Started

Performance Benchmarks

See latest benchmark numbers for OpenVINO and OpenVINO Model Server.

View data

Framework Compatibility

Load models directly (for TensorFlow, ONNX, PaddlePaddle) or convert to OpenVINO format.

Load your model

Easy Deployment

Get started in just a few lines of code.

Run Inference

Serving at scale

Cloud-ready deployments for microservice applications.

Check out Model Server

Model Compression

Reach for performance with post-training and training-time compression with NNCF.

Optimize now

Key Features#

See all features

Model Compression

You can either link directly with OpenVINO Runtime to run inference locally or use OpenVINO Model Server to serve model inference from a separate server or within Kubernetes environment.

Fast & Scalable Deployment

Write an application once, deploy it anywhere, achieving maximum performance from hardware. Automatic device discovery allows for superior deployment flexibility. OpenVINO Runtime supports Linux, Windows and MacOS and provides Python, C++ and C API. Use your preferred language and OS.

Lighter Deployment

Designed with minimal external dependencies reduces the application footprint, simplifying installation and dependency management. Popular package managers enable application dependencies to be easily installed and upgraded. Custom compilation for your specific model(s) further reduces final binary size.

Enhanced App Start-Up Time

In applications where fast start-up is required, OpenVINO significantly reduces first-inference latency by using the CPU for initial inference and then switching to another device once the model has been compiled and loaded to memory. Compiled models are cached, improving start-up time even more.