Model Caching Overview#
As described in Integrate OpenVINO™ with Your Application, a common workflow consists of the following steps:
- Create a Core object:First step to manage available devices and read model objects
- Read the Intermediate Representation:Read an Intermediate Representation file into the ov::Model object
- Prepare inputs and outputs:If needed, manipulate precision, memory layout, size or color format
- Set configuration:Add device-specific loading configurations to the device
- Compile and Load Network to device:Use the ov::Core::compile_model() method with a specific device
- Set input data:Specify input tensor
- Execute:Carry out inference and process results
Step 5 can potentially perform several time-consuming device-specific optimizations and network compilations. To reduce the resulting delays at application startup, you can use Model Caching. It exports the compiled model automatically and reuses it to significantly reduce the model compilation time.
Important
Not all devices support import/export of models. They will perform normally but will not enable the compilation stage speed-up.
Set configuration options#
device_name
option to specify the inference device.cache_dir
to enable model caching.from utils import get_path_to_model, get_temp_dir
import openvino as ov
import openvino.properties as props
# For example: "CPU", "GPU", "NPU".
device_name = 'CPU'
model_path = get_path_to_model()
path_to_cache_dir = get_temp_dir()
core = ov.Core()
core.set_property({props.cache_dir: path_to_cache_dir})
model = core.read_model(model=model_path)
compiled_model = core.compile_model(model=model, device_name=device_name)
void part0() {
std::string modelPath = "/tmp/myModel.xml";
std::string device = "GPU"; // For example: "CPU", "GPU", "NPU".
ov::AnyMap config;
ov::Core core; // Step 1: create ov::Core object
core.set_property(ov::cache_dir("/path/to/cache/dir")); // Step 1b: Enable caching
auto model = core.read_model(modelPath); // Step 2: Read Model
//... // Step 3: Prepare inputs/outputs
//... // Step 4: Set device configuration
auto compiled = core.compile_model(model, device, config); // Step 5: LoadNetwork
If the specified device supports import/export of models,
a cached blob file: .cl_cache
(GPU) or .blob
(CPU) is automatically
created inside the /path/to/cache/dir
folder.
If the device does not support import/export of models, the cache is not
created and no error is thrown.
Note that the first compile_model
operation takes slightly more time,
as the cache needs to be created - the compiled blob is saved into a file:
Use optimized methods#
Applications do not always require an initial customization of inputs and
outputs, as they can call model = core.read_model(...)
, then core.compile_model(model, ..)
,
which can be further optimized. Thus, the model can be compiled conveniently in a single call,
skipping the read step:
core = ov.Core()
compiled_model = core.compile_model(model=model_path, device_name=device_name)
ov::Core core; // Step 1: create ov::Core object
auto compiled = core.compile_model(modelPath, device, config); // Step 2: Compile model by file path
The total load time is even shorter, when model caching is enabled and read_model
is optimized as well.
core = ov.Core()
core.set_property({props.cache_dir: path_to_cache_dir})
compiled_model = core.compile_model(model=model_path, device_name=device_name)
ov::Core core; // Step 1: create ov::Core object
core.set_property(ov::cache_dir("/path/to/cache/dir")); // Step 1b: Enable caching
auto compiled = core.compile_model(modelPath, device, config); // Step 2: Compile model by file path
Advanced Examples#
Enabling model caching has no effect when the specified device does not support import/export of models. To check in advance if a particular device supports model caching, use the following code in your application:
import openvino.properties.device as device
# Find 'EXPORT_IMPORT' capability in supported capabilities
caching_supported = 'EXPORT_IMPORT' in core.get_property(device_name, device.capabilities)
// Get list of supported device capabilities
std::vector<std::string> caps = core.get_property(deviceName, ov::device::capabilities);
// Find 'EXPORT_IMPORT' capability in supported capabilities
bool cachingSupported = std::find(caps.begin(), caps.end(), ov::device::capability::EXPORT_IMPORT) != caps.end();
Enable cache encryption#
If model caching is enabled in the CPU Plugin, set the “cache_encryption_callbacks”
config option to encrypt the model while caching it and decrypt it when
loading it from the cache. Currently, this property can be set only in compile_model
.
import base64
def encrypt_base64(src):
return base64.b64encode(bytes(src, "utf-8"))
def decrypt_base64(src):
return base64.b64decode(bytes(src, "utf-8"))
core = ov.Core()
core.set_property({props.cache_dir: path_to_cache_dir})
config_cache = {}
config_cache["CACHE_ENCRYPTION_CALLBACKS"] = [encrypt_base64, decrypt_base64]
model = core.read_model(model=model_path)
compiled_model = core.compile_model(model=model, device_name=device_name, config=config_cache)
ov::AnyMap config;
ov::EncryptionCallbacks encryption_callbacks;
static const char codec_key[] = {0x30, 0x60, 0x70, 0x02, 0x04, 0x08, 0x3F, 0x6F, 0x72, 0x74, 0x78, 0x7F};
auto codec_xor = [&](const std::string& source_str) {
auto key_size = sizeof(codec_key);
int key_idx = 0;
std::string dst_str = source_str;
for (char& c : dst_str) {
c ^= codec_key[key_idx % key_size];
key_idx++;
}
return dst_str;
};
encryption_callbacks.encrypt = codec_xor;
encryption_callbacks.decrypt = codec_xor;
config.insert(ov::cache_encryption_callbacks(encryption_callbacks)); // Step 4: Set device configuration
auto compiled = core.compile_model(model, device, config); // Step 5: LoadNetwork
Full encryption only works when the CacheMode
property is set to OPTIMIZE_SIZE
.
import base64
def encrypt_base64(src):
return base64.b64encode(bytes(src, "utf-8"))
def decrypt_base64(src):
return base64.b64decode(bytes(src, "utf-8"))
core = ov.Core()
if "GPU" in core.available_devices:
core.set_property({props.cache_dir: path_to_cache_dir})
config_cache = {}
config_cache["CACHE_ENCRYPTION_CALLBACKS"] = [encrypt_base64, decrypt_base64]
config_cache["CACHE_MODE"] = "OPTIMIZE_SIZE"
compiled_model = core.compile_model(model=model_path, device_name='GPU', config=config_cache)
static const char codec_key[] = {0x30, 0x60, 0x70, 0x02, 0x04, 0x08, 0x3F, 0x6F, 0x72, 0x74, 0x78, 0x7F};
auto codec_xor = [&](const std::string& source_str) {
auto key_size = sizeof(codec_key);
int key_idx = 0;
std::string dst_str = source_str;
for (char& c : dst_str) {
c ^= codec_key[key_idx % key_size];
key_idx++;
}
return dst_str;
};
auto compiled = core.compile_model(modelPath,
device,
ov::cache_encryption_callbacks(ov::EncryptionCallbacks{codec_xor, codec_xor}),
ov::cache_mode(ov::CacheMode::OPTIMIZE_SIZE)); // Step 5: Compile model
Important
Currently, encryption is supported only by the CPU and GPU plugins. Enabling this feature for other HW plugins will not encrypt/decrypt model topology in the cache and will not affect performance.