Inference Devices and Modes#

The OpenVINO runtime offers multiple inference modes to enable the best hardware utilization under different conditions:

single-device inference

Define just one device responsible for the entire inference workload. It supports a range of
processors by means of the following plugins embedded in the Runtime library:
CPU
GPU
NPU

automated inference modes

Assume certain level of automation in selecting devices for inference. They may potentially
increase your deployed solution’s performance and portability. The automated modes are:
Automatic Device Selection (AUTO)
Heterogeneous Execution (HETERO)
Automatic Batching Execution (Auto-batching)
[DEPRECATED] Multi-Device Execution (MULTI)

To learn how to change the device configuration, read the Query device properties article.

Enumerating Available Devices#

The OpenVINO Runtime API features dedicated methods of enumerating devices and their capabilities. Note that beyond the typical “CPU” or “GPU” device names, more qualified names are used when multiple instances of a device are available (iGPU is always GPU.0). The output you receive may look like this (truncated to device names only, two GPUs are listed as an example):

./hello_query_device
Available devices:
    Device: CPU
...
    Device: GPU.0
...
    Device: GPU.1

You may see how to obtain this information in the Hello Query Device Sample. Here is an example of a simple programmatic way to enumerate the devices and use them with the multi-device mode:

C++

ov::Core core;
std::shared_ptr<ov::Model> model = core.read_model("sample.xml");
std::vector<std::string> availableDevices = core.get_available_devices();
std::string all_devices;
for (auto && device : availableDevices) {
    all_devices += device;
    all_devices += ((device == availableDevices[availableDevices.size()-1]) ? "" : ",");
}
ov::CompiledModel compileModel = core.compile_model(model, "MULTI",
    ov::device::priorities(all_devices));

With two GPU devices used in one setup, the explicit configuration would be “MULTI:GPU.1,GPU.0”. Accordingly, the code that loops over all available devices of the “GPU” type only is as follows:

C++

ov::Core core;
std::vector<std::string> GPUDevices = core.get_property("GPU", ov::available_devices);
std::string all_devices;
for (size_t i = 0; i < GPUDevices.size(); ++i) {
    all_devices += std::string("GPU.")
                            + GPUDevices[i]
                            + std::string(i < (GPUDevices.size() -1) ? "," : "");
}
ov::CompiledModel compileModel = core.compile_model("sample.xml", "MULTI",
    ov::device::priorities(all_devices));

Inference Devices and Modes#

Enumerating Available Devices#

Additional Resources#