OVMS Pull mode#

This documents describes how to leverage OpenVINO Model Server (OVMS) pull feature to automate deployment configuration with Generative AI models. When pulling from OpenVINO organization from HF no additional steps are required. However, when pulling models outside of the OpenVINO organization you have to install additional python dependencies when using baremetal execution so that optimum-cli is available for ovms executable or build the OVMS python container for docker deployments. In summary you have 2 options:

  • pulling preconfigured models in IR format from OpenVINO organization

  • pulling models with automatic conversion and quantization (requires optimum-cli). Include additional consideration like longer time for deployment and pulling model data (original model) from HF, model memory for conversion, diskspace - described here

Pulling the models#

There is a special mode to make OVMS pull the model from Hugging Face before starting the service:

Required: Docker Engine installed

docker run $(id -u):$(id -g) --rm -v <model_repository_path>:/models:rw openvino/model_server:latest --pull --source_model <model_name_in_HF> --model_repository_path /models --model_name <external_model_name> --target_device <DEVICE> --task <task> [TASK_SPECIFIC_PARAMETERS]

Required: OpenVINO Model Server package - see deployment instructions for details.

ovms --pull --source_model <model_name_in_HF> --model_repository_path <model_repository_path> --model_name <external_model_name> --target_device <DEVICE> --task <task> [TASK_SPECIFIC_PARAMETERS]

Example for pulling OpenVINO/Phi-3-mini-FastDraft-50M-int8-ov:

ovms --pull --source_model "OpenVINO/Phi-3-mini-FastDraft-50M-int8-ov" --model_repository_path /models --model_name Phi-3-mini-FastDraft-50M-int8-ov --target_device CPU --task text_generation 

Required: Docker Engine installed

docker run $(id -u):$(id -g) --rm -v <model_repository_path>:/models:rw openvino/model_server:latest --pull --source_model "OpenVINO/Phi-3-mini-FastDraft-50M-int8-ov" --model_repository_path /models --model_name Phi-3-mini-FastDraft-50M-int8-ov --task text_generation

Required: OpenVINO Model Server package - see deployment instructions for details.

ovms --pull --source_model "OpenVINO/Phi-3-mini-FastDraft-50M-int8-ov" --model_repository_path /models --model_name Phi-3-mini-FastDraft-50M-int8-ov --task text_generation 

It will prepare all needed configuration files to support LLMS with OVMS in the model repository. Check parameters page for detailed descriptions of configuration options and parameter usage.

In case you want to setup model and start server in one step follow instructions on this page.

Note: When using pull mode you need both read and write access rights to models repository.