Model Caching Overview#

As described in Integrate OpenVINO™ with Your Application, a common application flow consists of the following steps:

  1. Create a Core object:
    First step to manage available devices and read model objects
  2. Read the Intermediate Representation:
    Read an Intermediate Representation file into an object of the ov::Model
  3. Prepare inputs and outputs:
    If needed, manipulate precision, memory layout, size or color format
  4. Set configuration:
    Pass device-specific loading configurations to the device
  5. Compile and Load Network to device:
    Use the ov::Core::compile_model() method with a specific device
  6. Set input data:
    Specify input tensor
  7. Execute:
    Carry out inference and process results

Step 5 can potentially perform several time-consuming device-specific optimizations and network compilations. To reduce the resulting delays at application startup, you can use Model Caching. It exports the compiled model automatically and reuses it to significantly reduce the model compilation time.

Important

Not all devices support the network import/export feature. They will perform normally but will not enable the compilation stage speed-up.

Set “cache_dir” config option to enable model caching#

To enable model caching, the application must specify a folder to store the cached blobs:

from utils import get_path_to_model, get_temp_dir
import openvino as ov

import openvino.properties as props

# For example: "CPU", "GPU", "NPU".
device_name = 'CPU'
model_path = get_path_to_model()
path_to_cache_dir = get_temp_dir()

core = ov.Core()
core.set_property({props.cache_dir: path_to_cache_dir})
model = core.read_model(model=model_path)
compiled_model = core.compile_model(model=model, device_name=device_name)
void part0() {
    std::string modelPath = "/tmp/myModel.xml";
    std::string device = "GPU";                             // For example: "CPU", "GPU", "NPU".
    ov::AnyMap config;
ov::Core core;                                              // Step 1: create ov::Core object
core.set_property(ov::cache_dir("/path/to/cache/dir"));     // Step 1b: Enable caching
auto model = core.read_model(modelPath);                    // Step 2: Read Model
//...                                                       // Step 3: Prepare inputs/outputs
//...                                                       // Step 4: Set device configuration
auto compiled = core.compile_model(model, device, config);  // Step 5: LoadNetwork

With this code, if the device specified by device_name supports import/export model capability, a cached blob (the .cl_cache and .blob file for GPU and CPU respectively) is automatically created inside the /path/to/cache/dir folder. If the device does not support the import/export capability, cache is not created and no error is thrown.

Note that the first compile_model operation takes slightly longer, as the cache needs to be created - the compiled blob is saved into a cache file:

../../../../_images/caching_enabled.svg

Make it even faster: use compile_model(modelPath)#

In some cases, applications do not need to customize inputs and outputs every time. Such application always call model = core.read_model(...), then core.compile_model(model, ..), which can be further optimized. For these cases, there is a more convenient API to compile the model in a single call, skipping the read step:

core = ov.Core()
compiled_model = core.compile_model(model=model_path, device_name=device_name)
ov::Core core;                                                  // Step 1: create ov::Core object
auto compiled = core.compile_model(modelPath, device, config);  // Step 2: Compile model by file path

With model caching enabled, the total load time is even shorter, if read_model is optimized as well.

core = ov.Core()
core.set_property({props.cache_dir: path_to_cache_dir})
compiled_model = core.compile_model(model=model_path, device_name=device_name)
ov::Core core;                                                  // Step 1: create ov::Core object
core.set_property(ov::cache_dir("/path/to/cache/dir"));         // Step 1b: Enable caching
auto compiled = core.compile_model(modelPath, device, config);  // Step 2: Compile model by file path
../../../../_images/caching_times.svg

Advanced Examples#

Not every device supports the network import/export capability. For those that don’t, enabling caching has no effect. To check in advance if a particular device supports model caching, your application can use the following code:

import openvino.properties.device as device

# Find 'EXPORT_IMPORT' capability in supported capabilities
caching_supported = 'EXPORT_IMPORT' in core.get_property(device_name, device.capabilities)
// Get list of supported device capabilities
std::vector<std::string> caps = core.get_property(deviceName, ov::device::capabilities);

// Find 'EXPORT_IMPORT' capability in supported capabilities
bool cachingSupported = std::find(caps.begin(), caps.end(), ov::device::capability::EXPORT_IMPORT) != caps.end();