OpenVINO™ Integrations#

Hugging Face Optimum-Intel


Grab and use models leveraging OpenVINO within the Hugging Face API. The repository hosts pre-optimized OpenVINO IR models, so that you can use them in your projects without the need for any adjustments.
Benefits:
- Minimize complex coding for Generative AI.
Check example code
-from transformers import AutoModelForCausalLM
+from optimum.intel.openvino import OVModelForCausalLM

from transformers import AutoTokenizer, pipeline
model_id = "togethercomputer/RedPajama-INCITE-Chat-3B-v1"

-model = AutoModelForCausalLM.from_pretrained(model_id)
+model = OVModelForCausalLM.from_pretrained(model_id, export=True)

OpenVINO Execution Provider for ONNX Runtime


Utilize OpenVINO as a backend with your existing ONNX Runtime code.
Benefits:
- Enhanced inference performance on Intel hardware with minimal code modifications.
Check example code
device = `CPU_FP32`
# Set OpenVINO as the Execution provider to infer this model
sess.set_providers([`OpenVINOExecutionProvider`], [{device_type` : device}])

Torch.compile with OpenVINO


Use OpenVINO for Python-native applications by JIT-compiling code into optimized kernels.
Benefits:
- Enhanced inference performance on Intel hardware with minimal code modifications.
Check example code
import openvino.torch

...
model = torch.compile(model, backend='openvino')
...

OpenVINO LLMs with LlamaIndex


Build context-augmented GenAI applications with the LlamaIndex framework and enhance runtime performance with OpenVINO.
Benefits:
- Minimize complex coding for Generative AI.
Check example code
ov_config = {
    "PERFORMANCE_HINT": "LATENCY",
    "NUM_STREAMS": "1",
    "CACHE_DIR": "",
}

ov_llm = OpenVINOLLM(
    model_id_or_path="HuggingFaceH4/zephyr-7b-beta",
    context_window=3900,
    max_new_tokens=256,
    model_kwargs={"ov_config": ov_config},
    generate_kwargs={"temperature": 0.7, "top_k": 50, "top_p": 0.95},
    messages_to_prompt=messages_to_prompt,
    completion_to_prompt=completion_to_prompt,
    device_map="cpu",
)