Generative Model Preparation#
Since generative AI models tend to be big and resource-heavy, it is advisable to store them locally and optimize for efficient inference. This article will show how to prepare LLM models for inference with OpenVINO by:
Download Generative Models From Hugging Face Hub#
Pre-converted and pre-optimized models are available in the OpenVINO Toolkit organization, under the model section, or under different model collections:
You can also use the huggingface_hub package to download models:
pip install huggingface_hub
huggingface-cli download "OpenVINO/phi-2-fp16-ov" --local-dir model_path
The models can be used in OpenVINO immediately after download. No dependencies are required except huggingface_hub.
Download Generative Models From Model Scope#
To download models from Model Scope, use the modelscope package:
pip install modelscope
modelscope download --model "Qwen/Qwen2-7b" --local_dir model_path
Models downloaded via Model Scope are available in Pytorch format only and they must be converted to OpenVINO IR before inference.
Convert and Optimize Generative Models#
OpenVINO works best with models in the OpenVINO IR format, both in full precision and quantized. If your selected model has not been pre-optimized, you can easily do it yourself, using a single optimum-cli command. For that, make sure optimum-intel is installed on your system:
pip install optimum-intel[openvino]
While optimizing models, you can decide to keep the original precision or select one that is lower.
optimum-cli export openvino --model <model_id> --weight-format fp16 <exported_model_name>
Examples:
optimum-cli export openvino --model meta-llama/Llama-2-7b-chat-hf --weight-format fp16 ov_llama_2
optimum-cli export openvino --model stabilityai/stable-diffusion-xl-base-1.0 --weight-format fp16 ov_SDXL
optimum-cli export openvino --model openbmb/MiniCPM-V-2_6 --trust-remote-code –weight-format fp16 ov_MiniCPM-V-2_6
optimum-cli export openvino --trust-remote-code --model openai/whisper-base ov_whisper
optimum-cli export openvino --model <model_id> --weight-format int4 <exported_model_name>
Examples:
optimum-cli export openvino --model meta-llama/Llama-2-7b-chat-hf --weight-format int4 ov_llama_2
optimum-cli export openvino --model stabilityai/stable-diffusion-xl-base-1.0 --weight-format int4 ov_SDXL
optimum-cli export openvino -m model_path --task text-generation-with-past --weight-format int4 ov_MiniCPM-V-2_6
Note
Any other model_id
, for example openbmb/MiniCPM-V-2_6
, or the path
to a local model file can be used.
Also, you can specify different data type like int8
.