OpenVINO™ Integrations
Hugging Face Optimum-Intel
Grab and use models leveraging OpenVINO within the Hugging Face API.
The repository hosts pre-optimized OpenVINO IR models, so that you can use
them in your projects without the need for any adjustments.
Benefits:
- Minimize complex coding for Generative AI.
- from transformers import AutoModelForCausalLM
+ from optimum.intel.openvino import OVModelForCausalLM
from transformers import AutoTokenizer , pipeline
model_id = "togethercomputer/RedPajama-INCITE-Chat-3B-v1"
- model = AutoModelForCausalLM . from_pretrained ( model_id )
+ model = OVModelForCausalLM . from_pretrained ( model_id , export = True )
OpenVINO™ Integrations — OpenVINO™ documentationCopy to clipboardCopy to clipboardCopy to clipboardCopy to clipboard — Version(nightly)
OpenVINO Execution Provider for ONNX Runtime
Utilize OpenVINO as a backend with your existing ONNX Runtime code.
Benefits:
- Enhanced inference performance on Intel hardware with minimal code modifications.
device = ` CPU_FP32 `
# Set OpenVINO as the Execution provider to infer this model
sess . set_providers ([ ` OpenVINOExecutionProvider ` ], [{ device_type ` : device }])
OpenVINO™ Integrations — OpenVINO™ documentationCopy to clipboardCopy to clipboardCopy to clipboardCopy to clipboard — Version(nightly)
Torch.compile with OpenVINO
Use OpenVINO for Python-native applications by JIT-compiling code into optimized kernels.
Benefits:
- Enhanced inference performance on Intel hardware with minimal code modifications.
import openvino.torch
...
model = torch . compile ( model , backend = 'openvino' )
...
OpenVINO™ Integrations — OpenVINO™ documentationCopy to clipboardCopy to clipboardCopy to clipboardCopy to clipboard — Version(nightly)
OpenVINO LLMs with LlamaIndex
Build context-augmented GenAI applications with the LlamaIndex framework and enhance
runtime performance with OpenVINO.
Benefits:
- Minimize complex coding for Generative AI.
ov_config = {
"PERFORMANCE_HINT" : "LATENCY" ,
"NUM_STREAMS" : "1" ,
"CACHE_DIR" : "" ,
}
ov_llm = OpenVINOLLM (
model_id_or_path = "HuggingFaceH4/zephyr-7b-beta" ,
context_window = 3900 ,
max_new_tokens = 256 ,
model_kwargs = { "ov_config" : ov_config },
generate_kwargs = { "temperature" : 0.7 , "top_k" : 50 , "top_p" : 0.95 },
messages_to_prompt = messages_to_prompt ,
completion_to_prompt = completion_to_prompt ,
device_map = "cpu" ,
)
OpenVINO™ Integrations — OpenVINO™ documentationCopy to clipboardCopy to clipboardCopy to clipboardCopy to clipboard — Version(nightly)