OpenVINO Release Notes#
2024.6 - 18 December 2024#
System Requirements | Release policy | Installation Guides
What’s new#
OpenVINO 2024.6 release includes updates for enhanced stability and improved LLM performance.
Introduced support for Intel® Arc™ B-Series Graphics (formerly known as Battlemage).
Implemented optimizations to improve the inference time and LLM performance on NPUs.
Improved LLM performance with GenAI API optimizations and bug fixes.
OpenVINO™ Runtime#
CPU Device Plugin#
KV cache now uses asymmetric 8-bit unsigned integer (U8) as the default precision, reducing memory stress for LLMs and increasing their performance. This option can be controlled by model meta data.
Quality and accuracy has been improved for selected models with several bug fixes.
GPU Device Plugin#
Device memory copy optimizations have been introduced for inference with Intel® Arc™ B-Series Graphics (formerly known as Battlemage). Since it does not utilize L2 cache for copying memory between the device and host, a dedicated copy operation is used, if inputs or results are not expected in the device memory.
ChatGLM4 inference on GPU has been optimized.
NPU Device Plugin#
LLM performance and inference time has been improved with memory optimizations.
OpenVINO.GenAI#
The encrypted_model_causal_lm sample is now available, showing how to decrypt a model.
Other Changes and Known Issues#
Jupyter Notebooks#
Previous 2024 releases#
2024.5 - 20 November 2024
What’s new
More GenAI coverage and framework integrations to minimize code changes.
New models supported: Llama 3.2 (1B & 3B), Gemma 2 (2B & 9B), and YOLO11.
LLM support on NPU: Llama 3 8B, Llama 2 7B, Mistral-v0.2-7B, Qwen2-7B-Instruct and Phi-3 Mini-Instruct.
Noteworthy notebooks added: Sam2, Llama3.2, Llama3.2 - Vision, Wav2Lip, Whisper, and Llava.
Preview: support for Flax, a high-performance Python neural network library based on JAX. Its modular design allows for easy customization and accelerated inference on GPUs.
Broader Large Language Model (LLM) support and more model compression techniques.
Optimizations for built-in GPUs on Intel® Core™ Ultra Processors (Series 1) and Intel® Arc™ Graphics include KV Cache compression for memory reduction along with improved usability, and model load time optimizations to improve first token latency for LLMs.
Dynamic quantization was enabled to improve first token latency for LLMs on built-in Intel® GPUs without impacting accuracy on Intel® Core™ Ultra Processors (Series 1). Second token latency will also improve for large batch inference.
A new method to generate synthetic text data is implemented in the Neural Network Compression Framework (NNCF). This will allow LLMs to be compressed more accurately using data-aware methods without datasets. Coming soon: This feature will soon be accessible via Optimum Intel on Hugging Face.
More portability and performance to run AI at the edge, in the cloud, or locally.
Support for Intel® Xeon® 6 Processors with P-cores (formerly codenamed Granite Rapids) and Intel® Core™ Ultra 200V series processors (formerly codenamed Arrow Lake-S).
Preview: GenAI API enables multimodal AI deployment with support for multimodal pipelines for improved contextual awareness, transcription pipelines for easy audio-to-text conversions, and image generation pipelines for streamlined text-to-visual conversions.
Speculative decoding feature added to the GenAI API for improved performance and efficient text generation using a small draft model that is periodically corrected by the full-size model.
Preview: LoRA adapters are now supported in the GenAI API for developers to quickly and efficiently customize image and text generation models for specialized tasks.
The GenAI API now also supports LLMs on NPU allowing developers to specify NPU as the target device, specifically for WhisperPipeline (for whisper-base, whisper-medium, and whisper-small) and LLMPipeline (for Llama 3 8B, Llama 2 7B, Mistral-v0.2-7B, Qwen2-7B-Instruct and Phi-3 Mini-instruct). Use driver version 32.0.100.3104 or later for best performance.
Now deprecated
Python 3.8 is no longer supported:
OpenVINO™ Runtime
Common
Numpy 2.x has been adopted for all currently supported components, including NNCF.
A new constant constructor has been added, enabling constants to be created from data pointer as shared memory. Additionally, it can take ownership of a shared, or other, object, avoiding a two-step process to wrap memory into
ov::Tensor
.Asynchronous file reading with mmap library has been implemented, reducing loading times for model files, especially for LLMs.
CPU implementation of SliceScatter operator is now available, used for models such as Gemma, supporting increased LLM performance.
CPU Device Plugin
Gold support of the Intel® Xeon® 6 platform with P-cores (formerly code name Granite Rapids) has been reached.
Support of Intel® Core™ Ultra 200V series processors (formerly codenamed Arrow Lake-S) has been implemented.
LLM performance has been further improved with Rotary Position Embedding optimization; Query, Key, and Value; and multi-layer perceptron fusion optimization.
FP16 support has been extended with SDPA and PagedAttention, improving performance of LLM via both native APIs and the vLLM integration.
Models with LoRA adapters are now supported.
GPU Device Plugin
The KV cache INT8 compression mechanism is now available for all supported GPUs. It enables a significant reduction in memory consumption, increasing performance with a minimal impact to accuracy (it affects systolic devices slightly more than non-systolic ones). The feature is activated by default for non-systolic devices.
LoRA adapters are now functionally supported on GPU.
A new feature of GPU weightless blob caching enables caching model structure only and reusing the weights from the original model file. Use the new OPTIMIZE_SIZE property to activate.
Dynamic quantization with INT4 and INT8 precisions has been implemented and enabled by default on Intel® Core™ Ultra platforms, improving LLM first token latency.
NPU Device Plugin
Models retrieved from the OpenVINO cache have a smaller memory footprint now. The plugin releases the cached model (blob) after weights are loaded in NPU regions. Model export is not available in this scenario. Memory consumption is reduced during inference execution with one blob size. This optimization requires the latest NPU driver: 32.0.100.3104.
A driver bug for
ov::intel_npu::device_total_mem_size
has been fixed. The plugin will now report 2GB as the maximum allocatable memory for any driver that does not support graph extension 1.8. Even if older drivers report a larger amount of memory to be available, memory allocation would fail when 2GB are exceeded. Plugin reports the number that driver exposes for any driver that supports graph extension 1.8 (or newer).A new API is used to initialize the model (available in graph extension 1.8).
Inference request set_tensors is now supported.
ov::device::LUID
is now exposed on Windows.LLM-related improvements have been implemented in terms of both memory usage and performance.
AvgPool and MaxPool operator support has been extended, adding support for more PyTorch models.
NOTE: for systems based on Intel® Core™ Ultra Processors Series 2, more than 16GB of RAM may be required to use larger models, such as Llama-2-7B, Mistral-0.2-7B, and Qwen-2-7B (exceeding 4B parameters) with prompt sizes over 1024 tokens.
OpenVINO Python API
Constant now can be created from openvino.Tensor.
The “release_memory” method has been added for a compiled model, improving control over memory consumption.
OpenVINO Node.js API
Querying the best device to perform inference of a model with specific operations is now available in JavaScript API.
Contribution guidelines have been improved to make it easier for developers to contribute.
Testing scope has been extended by inference in end-to-end tests.
JavaScript API samples have been improved for readability and ease of running.
TensorFlow Framework Support
TensorFlow 2.18.0, Keras 3.6.0, NumPy 2.0.2 in Python 3.12, and NumPy 1.26.4 in other Python versions have been added to validation.
Out-of-the-box conversion with static ranks has been improved by devising a new shape for Switch-Merge condition sub-graphs.
Complex type for the following operations is now supported: ExpandDims, Pack, Prod, Rsqrt, ScatterNd, Sub.
The following issues have been fixed:
the corner case with one element in LinSpace to avoid division by zero,
support FP16 and FP64 input types for LeakyRelu,
support non-i32/i64 output index type for ArgMin/Max operations.
PyTorch Framework Support
PyTorch version 2.5 is now supported.
OpenVINO Model Converter (OVC) now supports TorchScript and ExportedProgram saved on a drive.
The issue of aten.index.Tensor conversion for indices with “None” values has been fixed, helping to support the HF Stable Diffusion model in ExportedProgram format.
ONNX Framework Support
ONNX version 1.17.0 is now used.
Customers’ models with DequantizeLinear-21, com.microsoft.MatMulNBits, and com.microsoft.QuickGelu operations are now supported.
JAX/Flax Framework Support
JAX 0.4.35 and Flax 0.10.0 has been added to validation.
jax._src.core.ClosedJaxpr object conversion is now supported.
Vision Transformer from google-research/vision_transformer is now supported (with support for 37 new operations).
OpenVINO Model Server
The OpenAI API text embedding endpoint has been added, enabling OVMS to be used as a building block for AI applications like RAG. (read more)
The rerank endpoint has been added based on Cohere API, enabling easy similarity detection between a query and a set of documents. It is one of the building blocks for AI applications like RAG and makes integration with frameworks such as langchain easy. (read more)
The following improvements have been done to LLM text generation:
The
echo
sampling parameter together withlogprobs
in thecompletions
endpoint is now supported.Performance has been increased on both CPU and GPU.
Throughput in high-concurrency scenarios has been increased with dynamic_split_fuse for GPU.
Testing coverage and stability has been improved.
The procedure for service deployment and model repository preparation has been simplified.
An experimental version of a Windows binary package - native model server for Windows OS - is available. This release includes a set of limitations and has limited tests coverage. It is intended for testing, while the production-ready release is expected with 2025.0. All feedback is welcome.
Neural Network Compression Framework
A new nncf.data.generate_text_data() method has been added for generating a synthetic dataset for LLM compression. This approach helps to compress LLMs more accurately in situations when the dataset is not available or not sufficient. See our example for more information about the usage.
Support of data-free and data-aware weight compression methods - nncf.compress_weights() - has been extended with NF4 per-channel quantization, making compressed LLMs more accurate and faster on NPU.
Caching of computed statistics in nncf.compress_weights() is now available, significantly reducing compression time when performing compression of the same LLM multiple times, with different compression parameters. To enable it, set the advanced
statistics_path
parameter of nncf.compress_weights() to the desired file path location.The
backup_mode
optional parameter has been added to nncf.compress_weights(), for specifying the data type for embeddings, convolutions, and last linear layers during 4-bit weight compression. Available options are INT8_ASYM (default), INT8_SYM, and NONE (retains the original floating-point precision of the model weights). In certain situations, non-default value might give better accuracy of compressed LLMs.Preview support is now available for optimizing models in Torch FX format, nncf.quantize(), and nncf.compress_weights() methods. After optimization such models can be directly executed via torch.compile(compressed_model, backend=”openvino”). For more details, see INT8 quantization example.
Memory consumption of data-aware weight compression methods - nncf.compress_weights() – has been reduced significantly, with some variation depending on the model and method.
Support for the following has changed:
NumPy 2 added
PyTorch upgraded to 2.5.1
ONNX upgraded to 1.17
Python 3.8 discontinued
OpenVINO Tokenizers
Several operations have been introduced and optimized.
Conversion parameters and environment info have been added to
rt_info
, improving reproducibility and debugging.
OpenVINO.GenAI
The following has been added:
LoRA adapter for the LLMPipeline.
Text2ImagePipeline with LoRA adapter and text2image samples.
VLMPipeline and visual_language_chat sample for text generation models with text and image inputs.
WhisperPipeline and whisper_speech_recognition sample.
speculative_decoding_lm has been moved to LLMPipeline based implementation and is now installed as part of the package.
On NPU, a set of pipelines has been enabled: WhisperPipeline (for whisper-base, whisper-medium, and whisper-small), LLMPipeline (for Llama 3 8B, Llama 2 7B, Mistral-v0.2-7B, Qwen2-7B-Instruct, and Phi-3 Mini-instruct). Use driver version 32.0.100.3104 or later for best performance.
Other Changes and Known Issues
Jupyter Notebooks
Known Issues
2024.4 - 19 September 2024
What’s new
More Gen AI coverage and framework integrations to minimize code changes.
Support for GLM-4-9B Chat, MiniCPM-1B, Llama 3 and 3.1, Phi-3-Mini, Phi-3-Medium and YOLOX-s models.
Noteworthy notebooks added: Florence-2, NuExtract-tiny Structure Extraction, Flux.1 Image Generation, PixArt-α: Photorealistic Text-to-Image Synthesis, and Phi-3-Vision Visual Language Assistant.
Broader Large Language Model (LLM) support and more model compression techniques.
OpenVINO™ runtime optimized for Intel® Xe Matrix Extensions (Intel® XMX) systolic arrays on built-in GPUs for efficient matrix multiplication resulting in significant LLM performance boost with improved 1st and 2nd token latency, as well as a smaller memory footprint on Intel® Core™ Ultra Processors (Series 2).
Memory sharing enabled for NPUs on Intel® Core™ Ultra Processors (Series 2) for efficient pipeline integration without memory copy overhead.
Addition of the PagedAttention feature for discrete GPUs* enables a significant boost in throughput for parallel inferencing when serving LLMs on Intel® Arc™ Graphics or Intel® Data Center GPU Flex Series.
More portability and performance to run AI at the edge, in the cloud, or locally.
Support for Intel® Core™ Ultra Processors Series 2 (formerly codenamed Lunar Lake) on Windows.
OpenVINO™ Model Server now comes with production-quality support for OpenAI-compatible API which enables significantly higher throughput for parallel inferencing on Intel® Xeon® processors when serving LLMs to many concurrent users.
Improved performance and memory consumption with prefix caching, KV cache compression, and other optimizations for serving LLMs using OpenVINO™ Model Server.
Support for Python 3.12.
Support for Red Hat Enterprise Linux (RHEL) version 9.3 - 9.4.
Now deprecated
The following will not be available beyond the 2024.4 OpenVINO version:
The macOS x86_64 debug bins
Python 3.8
Discrete Keem Bay support
Intel® Streaming SIMD Extensions (Intel® SSE) will be supported in source code form, but not enabled in the binary package by default, starting with OpenVINO 2025.0.
Check the deprecation section for more information.
OpenVINO™ Runtime
Common
Encryption and decryption of topology in model cache is now supported with callback functions provided by the user (CPU only for now; ov::cache_encryption_callbacks).
The Ubuntu20 and Ubuntu22 Docker images now include the tokenizers and GenAI CPP modules, including pre-installed Python modules, in development versions of these images.
Python 3.12 is now supported.
CPU Device Plugin
The following is now supported:
Tensor parallel feature for multi-socket CPU inference, with performance improvement for LLMs with 6B+ parameters (enabled through model_distribution_policy hint configurations).
RMSNorm operator, optimized with JIT kernel to improve both the 1st and 2nd token performance of LLMs.
The following has been improved:
vLLM support, with PagedAttention exposing attention score as the second output. It can now be used in the cache eviction algorithm to improve LLM serving performance.
1st token performance with Llama series of models, with additional CPU operator optimization (such as MLP, SDPA) on BF16 precision.
Default oneTBB version on Linux is now 2021.13.0, improving overall performance on latest Intel® Xeon® platforms.
MXFP4 weight compression models (compressing weights to 4-bit with the e2m1 data type without a zero point and with 8-bit e8m0 scales) have been optimized for Intel® Xeon® platforms thanks to fullyconnected compressed weight LLM support.
The following has been fixed:
Memory leak when ov::num_streams value is 0.
CPU affinity mask is changed after OpenVINO execution when OpenVINO is compiled with -DTHREADING=SEQ.
GPU Device Plugin
Dynamic quantization for LLMs is now supported on discrete GPU platforms.
Stable Diffusion 3 is now supported with good accuracy on Intel GPU platforms.
Both first and second token latency for LLMs have been improved on Intel GPU platforms.
The issue of model cache not regenerating with the value changes of
ov::hint::performance_mode
orov::hint::dynamic_quantization_group_size
has been fixed.
NPU Device Plugin
Remote Tensor API is now supported.
You can now query the available number of tiles (ov::intel_npu::max_tiles) and force a specific number of tiles to be used by the model, per inference request (ov::intel_npu::tiles). Note: ov::intel_npu::tiles overrides the default number of tiles selected by the compiler based on performance hints (ov::hint::performance_mode). Any tile number other than 1 may be a problem for cross platform compatibility, if not tested explicitly versus the max_tiles value.
You can now bypass the model caching mechanism in the driver (ov::intel_npu::bypass_umd_caching). Read more about driver and OpenVINO caching.
Memory footprint at model execution has been reduced by one blob (compiled model) size. For execution, the plugin no longer retrieves the compiled model from the driver, it uses the level zero graph handle directly, instead. The compiled model is now retrieved from the driver only during the export method.
OpenVINO Python API
Openvino.Tensor, when created in the shared memory mode, now prevents “garbage collection” of numpy memory.
The
openvino.experimental
submodule is now available, providing access to experimental functionalities under development.New python-exclusive openvino.Model constructors have been added.
Image padding in PreProcessor is now available.
OpenVINO Runtime is now compatible with numpy 2.0.
OpenVINO Node.js API
The following has been improved
Unit tests for increased efficiency and stability
Security updates applied to dependencies
Electron compatibility is now confirmed with new end-to-end tests.
New API methods added.
TensorFlow Framework Support
TensorFlow 2.17.0 is now supported.
JAX 0.4.31 is now supported via a path of jax2tf with native_serialization=False
8 NEW* operations have been added.
Tensor lists with multiple undefined dimensions in element_shape are now supported, enabling support for TF Hub lite0-detection/versions/1 model.
PyTorch Framework Support
Torch 2.4 is now supported.
Inplace ops are now supported automatically if the regular version is supported.
Symmetric GPTQ model from Hugging Face will now be automatically converted to the signed type (INT4) and zero-points will be removed.
ONNX Framework Support
ONNX 1.16.0 is now supported
models with constants/inputs of uINT4/INT4 types are now supported.
4 NEW operations have been added.
OpenVINO Model Server
OpenAI API for text generation is now officially supported and recommended for production usage. It comes with the following new features:
Prefix caching feature, caching the prompt evaluation to speed up text generation.
Ability to compress the KV Cache to a lower precision, reducing memory consumption without a significant loss of accuracy.
stop
sampling parameters, to define a sequence that stops text generation.logprobs
sampling parameter, returning the probabilities to returned tokens.Generic metrics related to execution of the MediaPipe graph that can be used for autoscaling based on the current load and the level of concurrency.
Demo of text generation horizontal scalability using basic docker containers and Kubernetes.
Automatic cancelling of text generation for disconnected clients.
Non-UTF-8 responses from the model can be now automatically changed to Unicode replacement characters, due to their configurable handling.
Intel GPU with paged attention is now supported.
Support for Llama3.1 models.
The following has been improved:
Handling of model templates without bos_token is now fixed.
Performance of the multinomial sampling algorithm.
finish_reason
in the response correctly determines reaching max_tokens (length) and completing the sequence (stop).Security and stability.
Neural Network Compression Framework
The LoRA Correction algorithm is now included in the Weight Compression method, improving the accuracy of INT4-compressed models on top of other data-aware algorithms, such as AWQ and Scale Estimation. To enable it, set the lora_correction option to True in nncf.compress_weights().
The GPTQ compression algorithm can now be combined with the Scale Estimation algorithm, making it possible to run GPTQ, AWQ, and Scale Estimation together, for the optimum-accuracy INT4-compressed models.
INT8 quantization of LSTMSequence and Convolution operations for constant inputs is now enabled, resulting in better performance and reduced model size.
OpenVINO Tokenizers
Split and BPE tokenization operations have been reimplemented, resulting in improved tokenization accuracy and performance.
New building options are now available, offering up to a 12x reduction in binary size.
An operation is now available to validate and skip/replace model-generated non-Unicode bytecode sequences during detokenization.
OpenVINO.GenAI
New samples and pipelines are now available:
An example IterableStreamer implementation in multinomial_causal_lm/python sample
GenAI compilation is now available as part of OpenVINO via the –DOPENVINO_EXTRA_MODULES CMake option.
Other Changes and Known Issues
Jupyter Notebooks
The list of supported models in LLM chatbot now includes Phi3.5, Gemma2 support
Known Issues
2024.3 - 31 July 2024
What’s new
More Gen AI coverage and framework integrations to minimize code changes.
OpenVINO pre-optimized models are now available in Hugging Face making it easier for developers to get started with these models.
Broader Large Language Model (LLM) support and more model compression techniques.
Significant improvement in LLM performance on Intel discrete GPUs with the addition of Multi-Head Attention (MHA) and OneDNN enhancements.
More portability and performance to run AI at the edge, in the cloud, or locally.
Improved CPU performance when serving LLMs with the inclusion of vLLM and continuous batching in the OpenVINO Model Server (OVMS). vLLM is an easy-to-use open-source library that supports efficient LLM inferencing and model serving.
Ubuntu 24.04 is now officially supported.
OpenVINO™ Runtime
Common
OpenVINO may now be used as a backend for vLLM, offering better CPU performance due to fully-connected layer optimization, fusing multiple fully-connected layers (MLP), U8 KV cache, and dynamic split fuse.
Ubuntu 24.04 is now officially supported, which means OpenVINO is now validated on this system (preview support).
The following have been improved:
Increasing support for models like YoloV10 or PixArt-XL-2, thanks to enabling Squeeze and Concat layers.
Performance of precision conversion FP16/BF16 -> FP32.
AUTO Inference Mode
Model cache is now disabled for CPU acceleration even when cache_dir is set, because CPU acceleration is skipped when the cached model is ready for the target device in the 2nd run.
Heterogeneous Inference Mode
PIPELINE_PARALLEL policy is now available, to inference large models on multiple devices per available memory size, being especially useful for large language models that don’t fit into one discrete GPU (a preview feature).
CPU Device Plugin
Fully Connected layers have been optimized together with RoPE optimization with JIT kernel to improve performance for LLM serving workloads on Intel AMX platforms.
Dynamic quantization of Fully Connected layers is now enabled by default on Intel AVX2 and AVX512 platforms, improving out-of-the-box performance for 8bit/4bit weight-compressed LLMs.
Performance has been improved for:
ARM server configuration, due to migration to Intel® oneAPI Threading Building Blocks 2021.13.
ARM for FP32 and FP16.
GPU Device Plugin
Performance has been improved for:
LLMs and Stable Diffusion on discrete GPUs, due to latency decrease, through optimizations such as Multi-Head Attention (MHA) and oneDNN improvements.
Whisper models on discrete GPU.
NPU Device Plugin
NPU inference of LLMs is now supported with GenAI API (preview feature). To support LLMs on NPU (requires the most recent version of the NPU driver), additional relevant features are also part of the NPU plugin now.
Models bigger than 2GB are now supported on both NPU driver (Intel® NPU Driver - Windows* 32.0.100.2540) and NPU plugin side (both Linux and Windows).
Memory optimizations have been implemented:
Weights are no longer copied from NPU compiler adapter.
Improved memory and first-ever inference latency for inference on NPU.
OpenVINO Python API
visit_attributes is now available for custom operation implemented in Python, enabling serialization of operation attributes.
Python API is now extended with new methods for Model class, e.g. Model.get_sink_index, new overloads for Model.get_result_index.
OpenVINO Node.js API
Tokenizers and StringTensor are now supported for LLM inference.
Compatibility with electron.js is now restored for desktop application developers.
Async version of Core.import_model and enhancements for Core.read_model methods are now available, for more efficient model reading, especially for LLMs.
TensorFlow Framework Support
Models with keras.LSTM operations are now more performant in CPU inference.
The tensor list initialized with an undefined element shape value is now supported.
TensorFlow Lite Framework Support
Constants containing spare tensors are now supported.
PyTorch Framework Support
Setting types/shapes for nested structures (e.g., dictionaries and tuples) is now supported.
The aten::layer_norm has been updated to support dynamic shape normalization.
Dynamic shapes support in the FX graph has been improved, benefiting torch.compile and torch.export based applications, improving performance for gemma and chatglm model families.
ONNX Framework Support
More models are now supported:
Models using the new version of the ReduceMean operation (introduced in ONNX opset 18).
Models using the Multinomial operation (introduced in ONNX opset 7).
OpenVINO Model Server
The following has been improved in OpenAI API text generation:
Performance results, due to OpenVINO Runtime and sampling algorithms.
Reporting generation engine metrics in the logs.
Extra sampling parameters added.
Request parameters affecting memory consumption now have value restrictions, within a configurable range.
The following has been fixed in OpenAI API text generation:
Generating streamer responses impacting incomplete utf-8 sequences.
A sporadic generation hang.
Incompatibility of the last response from the
completions
endpoint stream with the vLLM benchmarking script.
Neural Network Compression Framework
The MXFP4 data format is now supported in the Weight Compression method, compressing weights to 4-bit with the e2m1 data type without a zero point and with 8-bit e8m0 scales. This feature is enabled by setting
mode=CompressWeightsMode.E2M1
in nncf.compress_weights().The AWQ algorithm in the Weight Compression method has been extended for patterns: Act->MatMul and Act->MUltiply->MatMul to cover the Phi family models.
The representation of symmetrically quantized weights has been updated to a signed data type with no zero point. This allows NPU to support compressed LLMs with the symmetric mode.
BF16 models in Post-Training Quantization are now supported; nncf.quantize().
Activation Sparsity (Contextual Sparsity) algorithm in the Weight Compression method is now supported (preview), speeding up LLM inference. The algorithm is enabled by setting the
target_sparsity_by_scope
option in nncf.compress_weights() and supports Torch models only.
OpenVINO Tokenizers
The following is now supported:
Full Regex syntax with the PCRE2 library for text normalization and splitting.
Left padding side for all tokenizer types.
GLM-4 tokenizer support, as well as detokenization support for Phi-3 and Gemma have been improved.
Other Changes and Known Issues
Jupyter Notebooks
OpenVINO.GenAI
Performance counters have been added.
Preview support for NPU is now available.
Hugging Face
OpenVINO pre-optimized models are now available on Hugging Face:
For all the models see HuggingFace
Known Issues
2024.2 - 17 June 2024
What’s new
More Gen AI coverage and framework integrations to minimize code changes.
Llama 3 optimizations for CPUs, built-in GPUs, and discrete GPUs for improved performance and efficient memory usage.
Support for Phi-3-mini, a family of AI models that leverages the power of small language models for faster, more accurate and cost-effective text processing.
Python Custom Operation is now enabled in OpenVINO making it easier for Python developers to code their custom operations instead of using C++ custom operations (also supported). Python Custom Operation empowers users to implement their own specialized operations into any model.
Notebooks expansion to ensure better coverage for new models. Noteworthy notebooks added: DynamiCrafter, YOLOv10, Chatbot notebook with Phi-3, and QWEN2.
Broader Large Language Model (LLM) support and more model compression techniques.
GPTQ method for 4-bit weight compression added to NNCF for more efficient inference and improved performance of compressed LLMs.
Significant LLM performance improvements and reduced latency for both built-in GPUs and discrete GPUs.
Significant improvement in 2nd token latency and memory footprint of FP16 weight LLMs on AVX2 (13th Gen Intel® Core™ processors) and AVX512 (3rd Gen Intel® Xeon® Scalable Processors) based CPU platforms, particularly for small batch sizes.
More portability and performance to run AI at the edge, in the cloud, or locally.
Model Serving Enhancements:
Preview: OpenVINO Model Server (OVMS) now supports OpenAI-compatible API along with Continuous Batching and PagedAttention, enabling significantly higher throughput for parallel inferencing, especially on Intel® Xeon® processors, when serving LLMs to many concurrent users.
OpenVINO backend for Triton Server now supports dynamic input shapes.
Integration of TorchServe through torch.compile OpenVINO backend for easy model deployment, provisioning to multiple instances, model versioning, and maintenance.
Preview: addition of the Generate API, a simplified API for text generation using large language models with only a few lines of code. The API is available through the newly launched OpenVINO GenAI package.
Support for Intel® Atom® Processor X Series. For more details, see System Requirements.
Preview: Support for Intel® Xeon® 6 processor.
OpenVINO™ Runtime
Common
Operations and data types using UINT2, UINT3, and UINT6 are now supported, to allow for a more efficient LLM weight compression.
Common OV headers have been optimized, improving binary compilation time and reducing binary size.
AUTO Inference Mode
AUTO takes model caching into account when choosing the device for fast first-inference latency. If model cache is already in place, AUTO will directly use the selected device instead of temporarily leveraging CPU as first-inference device.
Dynamic models are now loaded to the selected device, instead of loading to CPU without considering device priority.
Fixed the exceptions when use AUTO with stateful models having dynamic input or output.
CPU Device Plugin
Performance when using latency mode in FP32 precision has been improved on Intel client platforms, including Intel® Core™ Ultra (formerly codenamed Meteor Lake) and 13th Gen Core processors (formerly codenamed Raptor Lake).
2nd token latency and memory footprint for FP16 LLMs have been improved significantly on AVX2 and AVX512 based CPU platforms, particularly for small batch sizes.
PagedAttention has been optimized on AVX2, AVX512 and AMX platforms together with INT8 KV cache support to improve the performance when serving LLM workloads on Intel CPUs.
LLMs with shared embeddings have been optimized to improve performance and memory consumption on several models including Gemma.
Performance on ARM-based servers is significantly improved with upgrade to TBB 2021.2.5.
Improved FP32 and FP16 performance on ARM CPU.
GPU Device Plugin
Both first token and average token latency of LLMs is improved on all GPU platforms, most significantly on discrete GPUs. Memory usage of LLMs has been reduced as well.
Stable Diffusion FP16 performance improved on Intel® Core™ Ultra platforms, with significant pipeline improvement for models with dynamic-shaped input. Memory usage of the pipeline has been reduced, as well.
Optimized permute_f_y kernel performance has been improved.
NPU Device Plugin
A new set of configuration options is now available.
Performance increase has been unlocked, with the new 2408 NPU driver.
OpenVINO Python API
Writing custom Python operators is now supported for basic scenarios (alignment with OpenVINO C++ API.) This empowers users to implement their own specialized operations into any model. Full support with more advanced features is within the scope of upcoming releases.
OpenVINO C API
More element types are now supported to algin with the OpenVINO C++ API.
OpenVINO Node.js API
OpenVINO node.js packages now support the electron.js framework.
Extended and improved JS API documentation for more complete usage guidelines.
Better JS API alignment with OpenVINO C++ API, delivering more advanced features to JS users.
TensorFlow Framework Support
3 new operations are now supported. See operations marked as NEW here.
LookupTableImport has received better support, required for 2 models from TF Hub:
mil-nce
openimages-v4-ssd-mobilenet-v2
TensorFlow Lite Framework Support
The GELU operation required for customer model is now supported.
PyTorch Framework Support
9 new operations are now supported.
aten::set_item now supports negative indices.
Issue with adaptive pool when shape is list has been fixed (PR #24586).
ONNX Support
The InputModel interface should be used from now on, instead of a number of deprecated APIs and class symbols
Translation for ReduceMin-18 and ReduceSumSquare-18 operators has been added, to address customer model requests
Behavior of the Gelu-20 operator has been fixed for the case when “none” is set as the default value.
OpenVINO Model Server
OpenVINO Model server can be now used for text generation use cases using OpenAI compatible API.
Added support for continuous batching and PagedAttention algorithms for text generation with fast and efficient in high concurrency load especially on Intel® Xeon® processors. Learn more about it.
Neural Network Compression Framework
GPTQ method is now supported in nncf.compress_weights() for data-aware 4-bit weight compression of LLMs. Enabled by gptq=True` in nncf.compress_weights().
Scale Estimation algorithm for more accurate 4-bit compressed LLMs. Enabled by scale_estimation=True` in nncf.compress_weights().
Added support for models with BF16 weights in nncf.compress_weights().
nncf.quantize() method is now the recommended path for quantization initialization of PyTorch models in Quantization-Aware Training. See example for more details.
compressed_model.nncf.get_config() and nncf.torch.load_from_config() API have been added to save and restore quantized PyTorch models. See example for more details.
Automatic support for int8 quantization of PyTorch models with custom modules has been added. Now it is not needed to register such modules before quantization.
Other Changes and Known Issues
Jupyter Notebooks
Latest notebooks along with the GitHub validation status can be found in the OpenVINO notebook section
The following notebooks have been updated or newly added:
Known Issues
2024.1 - 24 April 2024
What’s new
More Gen AI coverage and framework integrations to minimize code changes.
Mixtral and URLNet models optimized for performance improvements on Intel® Xeon® processors.
Stable Diffusion 1.5, ChatGLM3-6B, and Qwen-7B models optimized for improved inference speed on Intel® Core™ Ultra processors with integrated GPU.
Support for Falcon-7B-Instruct, a GenAI Large Language Model (LLM) ready-to-use chat/instruct model with superior performance metrics.
New Jupyter Notebooks added: YOLO V9, YOLO V8 Oriented Bounding Boxes Detection (OOB), Stable Diffusion in Keras, MobileCLIP, RMBG-v1.4 Background Removal, Magika, TripoSR, AnimateAnyone, LLaVA-Next, and RAG system with OpenVINO and LangChain.
Broader LLM model support and more model compression techniques.
LLM compilation time reduced through additional optimizations with compressed embedding. Improved 1st token performance of LLMs on 4th and 5th generations of Intel® Xeon® processors with Intel® Advanced Matrix Extensions (Intel® AMX).
Better LLM compression and improved performance with oneDNN, INT4, and INT8 support for Intel® Arc™ GPUs.
Significant memory reduction for select smaller GenAI models on Intel® Core™ Ultra processors with integrated GPU.
More portability and performance to run AI at the edge, in the cloud, or locally.
The preview NPU plugin for Intel® Core™ Ultra processors is now available in the OpenVINO open-source GitHub repository, in addition to the main OpenVINO package on PyPI.
The JavaScript API is now more easily accessible through the npm repository, enabling JavaScript developers’ seamless access to the OpenVINO API.
FP16 inference on ARM processors now enabled for the Convolutional Neural Network (CNN) by default.
OpenVINO™ Runtime
Common
Unicode file paths for cached models are now supported on Windows.
Pad pre-processing API to extend input tensor on edges with constants.
A fix for inference failures of certain image generation models has been implemented (fused I/O port names after transformation).
Compiler’s warnings-as-errors option is now on, improving the coding criteria and quality. Build warnings will not be allowed for new OpenVINO code and the existing warnings have been fixed.
AUTO Inference Mode
Returning the ov::enable_profiling value from ov::CompiledModel is now supported.
CPU Device Plugin
1st token performance of LLMs has been improved on the 4th and 5th generations of Intel® Xeon® processors with Intel® Advanced Matrix Extensions (Intel® AMX).
LLM compilation time and memory footprint have been improved through additional optimizations with compressed embeddings.
Performance of MoE (e.g. Mixtral), Gemma, and GPT-J has been improved further.
Performance has been improved significantly for a wide set of models on ARM devices.
FP16 inference precision is now the default for all types of models on ARM devices.
CPU architecture-agnostic build has been implemented, to enable unified binary distribution on different ARM devices.
GPU Device Plugin
LLM first token latency has been improved on both integrated and discrete GPU platforms.
For the ChatGLM3-6B model, average token latency has been improved on integrated GPU platforms.
For Stable Diffusion 1.5 FP16 precision, performance has been improved on Intel® Core™ Ultra processors.
NPU Device Plugin
NPU Plugin is now part of the OpenVINO GitHub repository. All the most recent plugin changes will be immediately available in the repo. Note that NPU is part of Intel® Core™ Ultra processors.
New OpenVINO™ notebook “Hello, NPU!” introducing NPU usage with OpenVINO has been added.
Version 22H2 or later is required for Microsoft Windows® 11 64-bit to run inference on NPU.
OpenVINO Python API
GIL-free creation of RemoteTensors is now used - holding GIL means that the process is not suited for multithreading and removing the GIL lock will increase performance which is critical for the concept of Remote Tensors.
Packed data type BF16 on the Python API level has been added, opening a new way of supporting data types not handled by numpy.
‘pad’ operator support for ov::preprocess::PrePostProcessorItem has been added.
ov.PartialShape.dynamic(int) definition has been provided.
OpenVINO C API
Two new pre-processing APIs for scale and mean have been added.
OpenVINO Node.js API
New methods to align JavaScript API with CPP API have been added, such as CompiledModel.exportModel(), core.import_model(), Core set/get property and Tensor.get_size(), and Model.is_dynamic().
Documentation has been extended to help developers start integrating JavaScript applications with OpenVINO™.
TensorFlow Framework Support
tf.keras.layers.TextVectorization tokenizer is now supported.
Conversion of models with Variable and HashTable (dictionary) resources has been improved.
8 NEW operations have been added (see the list here, marked as NEW).
10 operations have received complex tensor support.
Input tensor names for TF1 models have been adjusted to have a single name per input.
Hugging Face model support coverage has increased significantly, due to:
extraction of input signature of a model in memory has been fixed,
reading of variable values for a model in memory has been fixed.
PyTorch Framework Support
ModuleExtension, a new type of extension for PyTorch models is now supported (PR #23536).
22 NEW operations have been added.
Experimental support for models produced by torch.export (FX graph) has been added (PR #23815).
ONNX Framework Support
8 new operations have been added.
OpenVINO Model Server
OpenVINO™ Runtime backend used is now 2024.1
OpenVINO™ models with String data type on output are supported. Now, OpenVINO™ Model Server can support models with input and output of the String type, so developers can take advantage of the tokenization built into the model as the first layer. Developers can also rely on any postprocessing embedded into the model which returns text only. Check the demo on string input data with the universal-sentence-encoder model and the String output model demo.
MediaPipe Python calculators have been updated to support relative paths for all related configuration and Python code files. Now, the complete graph configuration folder can be deployed in an arbitrary path without any code changes.
KServe REST API support has been extended to properly handle the string format in JSON body, just like the binary format compatible with NVIDIA Triton™.
A demo showcasing a full RAG algorithm fully delegated to the model server has been added.
Neural Network Compression Framework
Model subgraphs can now be defined in the ignored scope for INT8 Post-training Quantization, nncf.quantize(), which simplifies excluding accuracy-sensitive layers from quantization.
A batch size of more than 1 is now partially supported for INT8 Post-training Quantization, speeding up the process. Note that it is not recommended for transformer-based models as it may impact accuracy. Here is an example demo.
Now it is possible to apply fine-tuning on INT8 models after Post-training Quantization to improve model accuracy and make it easier to move from post-training to training-aware quantization. Here is an example demo.
OpenVINO Tokenizers
TensorFlow support has been extended - TextVectorization layer translation:
Aligned existing ops with TF ops and added a translator for them.
Added new ragged tensor ops and string ops.
A new tokenizer type, RWKV is now supported:
Added Trie tokenizer and Fuse op for ragged tensors.
A new way to get OV Tokenizers: build a vocab from file.
Tokenizer caching has been redesigned to work with the OpenVINO™ model caching mechanism.
Other Changes and Known Issues
Jupyter Notebooks
The default branch for the OpenVINO™ Notebooks repository has been changed from ‘main’ to ‘latest’. The ‘main’ branch of the notebooks repository is now deprecated and will be maintained until September 30, 2024.
The new branch, ‘latest’, offers a better user experience and simplifies maintenance due to significant refactoring and an improved directory naming structure.
Use the local README.md file and OpenVINO™ Notebooks at GitHub Pages to navigate through the content.
The following notebooks have been updated or newly added:
Known Issues
2024.0 - 06 March 2024
What’s new
More Generative AI coverage and framework integrations to minimize code changes.
Improved out-of-the-box experience for TensorFlow sentence encoding models through the installation of OpenVINO™ toolkit Tokenizers.
New and noteworthy models validated: Mistral, StableLM-tuned-alpha-3b, and StableLM-Epoch-3B.
OpenVINO™ toolkit now supports Mixture of Experts (MoE), a new architecture that helps process more efficient generative models through the pipeline.
JavaScript developers now have seamless access to OpenVINO API. This new binding enables a smooth integration with JavaScript API.
Broader Large Language Model (LLM) support and more model compression techniques.
Broader Large Language Model (LLM) support and more model compression techniques.
Improved quality on INT4 weight compression for LLMs by adding the popular technique, Activation-aware Weight Quantization, to the Neural Network Compression Framework (NNCF). This addition reduces memory requirements and helps speed up token generation.
Experience enhanced LLM performance on Intel® CPUs, with internal memory state enhancement, and INT8 precision for KV-cache. Specifically tailored for multi-query LLMs like ChatGLM.
The OpenVINO™ 2024.0 release makes it easier for developers, by integrating more OpenVINO™ features with the Hugging Face ecosystem. Store quantization configurations for popular models directly in Hugging Face to compress models into INT4 format while preserving accuracy and performance.
More portability and performance to run AI at the edge, in the cloud, or locally.
A preview plugin architecture of the integrated Neural Processor Unit (NPU) as part of Intel® Core™ Ultra processor (formerly codenamed Meteor Lake) is now included in the main OpenVINO™ package on PyPI.
Improved performance on ARM by enabling the ARM threading library. In addition, we now support multi-core ARM processors and enabled FP16 precision by default on MacOS.
New and improved LLM serving samples from OpenVINO Model Server for multi-batch inputs and Retrieval Augmented Generation (RAG).
OpenVINO™ Runtime
Common
The legacy API for CPP and Python bindings has been removed.
StringTensor support has been extended by operators such as
Gather
,Reshape
, andConcat
, as a foundation to improve support for tokenizer operators and compliance with the TensorFlow Hub.oneDNN has been updated to v3.3. (see oneDNN release notes).
CPU Device Plugin
LLM performance on Intel® CPU platforms has been improved for systems based on AVX2 and AVX512, using dynamic quantization and internal memory state optimization, such as INT8 precision for KV-cache. 13th and 14th generations of Intel® Core™ processors and Intel® Core™ Ultra processors use AVX2 for CPU execution, and these platforms will benefit from speedup. Enable these features by setting
"DYNAMIC_QUANTIZATION_GROUP_SIZE":"32"
and"KV_CACHE_PRECISION":"u8"
in the configuration file.The
ov::affinity
API configuration is now deprecated and will be removed in release 2025.0.The following have been improved and optimized:
Multi-query structure LLMs (such as ChatGLM 2/3) for BF16 on the 4th and 5th generation Intel® Xeon® Scalable processors.
Mixtral model performance.
8-bit compressed LLM compilation time and memory usage, valuable for models with large embeddings like Qwen.
Convolutional networks in FP16 precision on ARM processors.
GPU Device Plugin
The following have been improved and optimized:
Average token latency for LLMs on integrated GPU (iGPU) platforms, using INT4-compressed models with large context size on Intel® Core™ Ultra processors.
LLM beam search performance on iGPU. Both average and first-token latency decrease may be expected for larger context sizes.
Multi-batch performance of YOLOv5 on iGPU platforms.
Memory usage for LLMs has been optimized, enabling ‘7B’ models with larger context on 16Gb platforms.
NPU Device Plugin (preview feature)
The NPU plugin for OpenVINO™ is now available through PyPI (run “pip install openvino”).
OpenVINO Python API
.add_extension
method signatures have been aligned, improving API behavior for better user experience.
OpenVINO C API
ov_property_key_cache_mode (C++ ov::cache_mode) now enables the
optimize_size
andoptimize_speed
modes to set/get model cache.The VA surface on Windows exception has been fixed.
OpenVINO Node.js API
OpenVINO - JS bindings are consistent with the OpenVINO C++ API.
A new distribution channel is now available: Node Package Manager (npm) software registry (check the installation guide).
JavaScript API is now available for Windows users, as some limitations for platforms other than Linux have been removed.
TensorFlow Framework Support
String tensors are now natively supported, handled on input, output, and intermediate layers (PR #22024).
TensorFlow Hub universal-sentence-encoder-multilingual inferred out of the box
string tensors supported for
Gather
,Concat
, andReshape
operationsintegration with openvino-tokenizers module - importing openvino-tokenizers automatically patches TensorFlow FE with the required translators for models with tokenization
Fallback for Model Optimizer by operation to the legacy Frontend is no longer available. Fallback by .json config will remain until Model Optimizer is discontinued (PR #21523).
Support for the following has been added:
Mutable variables and resources such as HashTable*, Variable, VariableV2 (PR #22270).
New tensor types: tf.u16, tf.u32, and tf.u64 (PR #21864).
14 NEW Ops*. Check the list here (marked as NEW).
TensorFlow 2.15 (PR #22180).
The following issues have been fixed:
UpSampling2D conversion crashed when input type as int16 (PR #20838).
IndexError list index for Squeeze (PR #22326).
Correct FloorDiv computation for signed integers (PR #22684).
Fixed bad cast error for tf.TensorShape to ov.PartialShape (PR #22813).
Fixed reading tf.string attributes for models in memory (PR #22752).
ONNX Framework Support
ONNX Frontend now uses the OpenVINO API 2.0.
PyTorch Framework Support
Names for outputs unpacked from dict or tuple are now clearer (PR #22821).
FX Graph (torch.compile) now supports kwarg inputs, improving data type coverage. (PR #22397).
OpenVINO Model Server
OpenVINO™ Runtime backend used is now 2024.0.
Text generation demo now supports multi batch size, with streaming and unary clients.
The REST client now supports servables based on mediapipe graphs, including python pipeline nodes.
Included dependencies have received security-related updates.
Reshaping a model in runtime based on the incoming requests (auto shape and auto batch size) is deprecated and will be removed in the future. Using OpenVINO’s dynamic shape models is recommended instead.
Neural Network Compression Framework (NNCF)
The Activation-aware Weight Quantization (AWQ) algorithm for data-aware 4-bit weights compression is now available. It facilitates better accuracy for compressed LLMs with high ratio of 4-bit weights. To enable it, use the dedicated
awq
optional parameter ofthe nncf.compress_weights()
API.ONNX models are now supported in Post-training Quantization with Accuracy Control, through the
nncf.quantize_with_accuracy_control()
, method. It may be used for models in the OpenVINO IR and ONNX formats.A weight compression example tutorial is now available, demonstrating how to find the appropriate hyperparameters for the TinyLLama model from the Hugging Face Transformers, as well as other LLMs, with some modifications.
OpenVINO Tokenizer
Regex support has been improved.
Model coverage has been improved.
Tokenizer metadata has been added to rt_info.
Limited support for Tensorflow Text models has been added: convert MUSE for TF Hub with string inputs.
OpenVINO Tokenizers have their own repository now: /openvino_tokenizers
Other Changes and Known Issues
Jupyter Notebooks
The following notebooks have been updated or newly added:
InstantID: Zero-shot Identity-Preserving Generation using OpenVINO
Tutorial for Big Image Transfer (BIT) model quantization using NNCF
Tutorial for OpenVINO Tokenizers integration into inference pipelines
LLM chatbot and LLM RAG pipeline have received integration with new models: minicpm-2b-dpo, gemma-7b-it, qwen1.5-7b-chat, baichuan2-7b-chat
Known issues
Deprecation And Support#
Using deprecated features and components is not advised. They are available to enable a smooth transition to new solutions and will be discontinued in the future. To keep using discontinued features, you will have to revert to the last LTS OpenVINO version supporting them. For more details, refer to the OpenVINO Legacy Features and Components page.
Discontinued in 2024#
Runtime components:
Intel® Gaussian & Neural Accelerator (Intel® GNA). Consider using the Neural Processing Unit (NPU) for low-powered systems like Intel® Core™ Ultra or 14th generation and beyond.
OpenVINO C++/C/Python 1.0 APIs (see 2023.3 API transition guide for reference).
All ONNX Frontend legacy API (known as ONNX_IMPORTER_API).
PerfomanceMode.UNDEFINED
property as part of the OpenVINO Python API.
Tools:
Deployment Manager. See installation and deployment guides for current distribution options.
Post-Training Optimization Tool (POT). Neural Network Compression Framework (NNCF) should be used instead.
A Git patch for NNCF integration with huggingface/transformers. The recommended approach is to use huggingface/optimum-intel for applying NNCF optimization on top of models from Hugging Face.
Support for Apache MXNet, Caffe, and Kaldi model formats. Conversion to ONNX may be used as a solution.
The macOS x86_64 debug bins are no longer provided with the OpenVINO toolkit, starting with OpenVINO 2024.5.
Python 3.8 is no longer supported, starting with OpenVINO 2024.5.
As MxNet doesn’t support Python version higher than 3.8, according to the MxNet PyPI project, it is no longer supported by OpenVINO, either.
Discrete Keem Bay support is no longer supported, starting with OpenVINO 2024.5.
Support for discrete devices (formerly codenamed Raptor Lake) is no longer available for NPU.
Deprecated and to be removed in the future#
Intel® Streaming SIMD Extensions (Intel® SSE) will be supported in source code form, but not enabled in the binary package by default, starting with OpenVINO 2025.0.
Ubuntu 20.04 support will be deprecated in future OpenVINO releases due to the end of standard support.
The openvino-nightly PyPI module will soon be discontinued. End-users should proceed with the Simple PyPI nightly repo instead. More information in Release Policy.
The OpenVINO™ Development Tools package (pip install openvino-dev) will be removed from installation options and distribution channels beginning with OpenVINO 2025.0.
Model Optimizer will be discontinued with OpenVINO 2025.0. Consider using the new conversion methods instead. For more details, see the model conversion transition guide.
OpenVINO property Affinity API will be discontinued with OpenVINO 2025.0. It will be replaced with CPU binding configurations (
ov::hint::enable_cpu_pinning
).OpenVINO Model Server components:
“auto shape” and “auto batch size” (reshaping a model in runtime) will be removed in the future. OpenVINO’s dynamic shape models are recommended instead.
Starting with 2025.0 MacOS x86 will no longer be recommended for use due to the discontinuation of validation. Full support will be removed later in 2025.
A number of notebooks have been deprecated. For an up-to-date listing of available notebooks, refer to the OpenVINO™ Notebook index (openvinotoolkit.github.io).
See the deprecated notebook list
Handwritten OCR with OpenVINO™
See alternative: Optical Character Recognition (OCR) with OpenVINO™,
See alternative: PaddleOCR with OpenVINO™,
See alternative: Handwritten Text Recognition Demo
Image In-painting with OpenVINO™
See alternative: Image Inpainting Python Demo
Interactive Machine Translation with OpenVINO
See alternative: Machine Translation Python* Demo
-
No alternatives, demonstrates deprecated tools.
Super Resolution with OpenVINO™
See alternative: Super Resolution with PaddleGAN and OpenVINO
See alternative: Image Processing C++ Demo
Interactive Question Answering with OpenVINO™
See alternative: BERT Question Answering Embedding Python* Demo
See alternative: BERT Question Answering Python* Demo
Vehicle Detection And Recognition with OpenVINO™
See alternative: Security Barrier Camera C++ Demo
Instruction following using Databricks Dolly 2.0 and OpenVINO™
See alternative: LLM Instruction-following pipeline with OpenVINO
Video Subtitle Generation with OpenAI Whisper
See alternative: Automatic speech recognition using Distil-Whisper and OpenVINO
Subject-driven image generation and editing using BLIP Diffusion and OpenVINO
Quantize Data2Vec Speech Recognition Model using NNCF PTQ API
Video Recognition using SlowFast and OpenVINO™
See alternative: Live Action Recognition with OpenVINO™
Text-to-Image Generation with Stable Diffusion v2 and OpenVINO™
Image generation with Segmind Stable Diffusion 1B (SSD-1B) model and OpenVINO
Train a Kidney Segmentation Model with MONAI and PyTorch Lightning
Live Inference and Benchmark CT-scan Data with OpenVINO™
See alternative: Quantize a Segmentation Model and Show Live Inference
Legal Information#
You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein.
You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein.
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest Intel product specifications and roadmaps.
The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at www.intel.com or from the OEM or retailer.
No computer system can be absolutely secure.
Intel, Atom, Core, Xeon, OpenVINO, and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.
Other names and brands may be claimed as the property of others.
Copyright © 2024, Intel Corporation. All rights reserved.
For more complete information about compiler optimizations, see our Optimization Notice.
Performance varies by use, configuration and other factors.