Group Device properties#

group ov_runtime_cpp_prop_api

Enums

enum class SchedulePolicy#

Enum to define the policy of scheduling inference request to target device in cumulative throughput mode on AUTO.

Values:

enumerator ROUND_ROBIN#

enumerator DEVICE_PRIORITY#

enumerator DEFAULT#: Default schedule policy is DEVICE_PRIORITY.

enum class Priority#

Enum to define possible priorities hints.

Values:

enumerator LOW#: Low priority.

enumerator MEDIUM#: Medium priority.

enumerator HIGH#: High priority.

enumerator DEFAULT#: Default priority is MEDIUM.

enum class PerformanceMode#

Enum to define possible performance mode hints.

Values:

enumerator LATENCY#: Optimize for latency.

enumerator THROUGHPUT#: Optimize for throughput.

enumerator CUMULATIVE_THROUGHPUT#: Optimize for cumulative throughput.

enum class ExecutionMode#

Enum to define possible execution mode hints.

Values:

enumerator PERFORMANCE#: Optimize for max performance, may apply properties which slightly affect accuracy.

enumerator ACCURACY#: Optimize for max accuracy.

enum class Level#

Enum to define possible log levels.

Values:

enumerator NO#: disable any logging

enumerator ERR#: error events that might still allow the application to continue running

enumerator WARNING#: potentially harmful situations which may further lead to ERROR

enumerator INFO#: informational messages that display the progress of the application at coarse-grained level

enumerator DEBUG#: fine-grained events that are most useful to debug an application.

enumerator TRACE#: finer-grained informational events than the DEBUG

enum class WorkloadType#

Enum to define possible workload types.

Workload type represents the execution priority for an inference.

Values:

enumerator DEFAULT#

enumerator EFFICIENT#

enum class CacheMode#

Enum to define possible cache mode.

Values:

enumerator OPTIMIZE_SIZE#: smaller cache size

enumerator OPTIMIZE_SPEED#: faster loading time

enum class Type#

Enum to define possible device types.

Values:

enumerator INTEGRATED#: Device is integrated into host system.

enumerator DISCRETE#: Device is not integrated into host system.

Variables

static constexpr Property<std::vector<PropertyName>, PropertyMutability::RO> supported_properties{"INTERNAL_SUPPORTED_PROPERTIES"}#: Read-only property to get a std::vector<PropertyName> of supported internal properties.

static constexpr Property<SchedulePolicy> schedule_policy = {"SCHEDULE_POLICY"}#: High-level OpenVINO model policy hint Defines what scheduling policy should be used in AUTO CUMULATIVE_THROUGHPUT or MULTI case.

static constexpr Property<std::vector<PropertyName>, PropertyMutability::RO> supported_properties{"SUPPORTED_PROPERTIES"}: Read-only property to get a std::vector<PropertyName> of supported read-only properties. This can be used as a compiled model property as well.

static constexpr Property<std::vector<std::string>, PropertyMutability::RO> available_devices = {"AVAILABLE_DEVICES"}#: Read-only property to get a std::vector<std::string> of available device IDs.

static constexpr Property<std::string, PropertyMutability::RO> model_name = {"NETWORK_NAME"}#: Read-only property to get a name of name of a model.

static constexpr Property<uint32_t, PropertyMutability::RO> optimal_number_of_infer_requests{"OPTIMAL_NUMBER_OF_INFER_REQUESTS"}#: Read-only property to get an unsigned integer value of optimal number of compiled model infer requests.

static constexpr Property<element::Type, PropertyMutability::RW> inference_precision = {"INFERENCE_PRECISION_HINT"}#: Hint for device to use specified precision for inference.

static constexpr Property<Priority> model_priority = {"MODEL_PRIORITY"}#: High-level OpenVINO model priority hint Defines what model should be provided with more performant bounded resource first.

static constexpr Property<PerformanceMode> performance_mode = {"PERFORMANCE_HINT"}#: High-level OpenVINO Performance Hints unlike low-level properties that are individual (per-device), the hints are something that every device accepts and turns into device-specific settings.

static constexpr Property<SchedulingCoreType> scheduling_core_type = {"SCHEDULING_CORE_TYPE"}#

This property defines CPU core type which can be used during inference.

Developer can use this property to select specific CPU cores for inference. Please refer SchedulingCoreType for all definition of core type.

The following code is an example to only use efficient-cores for inference on hybrid CPU. If user sets this configuration on a platform with only performance-cores, CPU inference will still run on the performance-cores.

ie.set_property(ov::hint::scheduling_core_type(ov::hint::SchedulingCoreType::ECORE_ONLY));

static constexpr Property<std::set<ModelDistributionPolicy>> model_distribution_policy = {"MODEL_DISTRIBUTION_POLICY"}#

This property defines model distribution policy for inference with multiple sockets/devices.

This property can be used to select model distribution policy between execution units (e.g. between CPU sockets/NUMA nodes or between different GPUs). — TENSOR_PARALLEL : Distribute tensor to multiple sockets/devices during model compilation. At inference time, sockets/devices process individual tensor in parallel. — PIPELINE_PARALLEL : Distribute tensor to multiple sockets/devices during model compilation. At inference time, sockets/devices process individual tensor one by one. And each socket/device processes a portion of a different tensor in parallel.

The following code is an example how TENSOR_PARALLEL or PIPELINE_PARALLEL model distribution policy might be enabled.

ie.set_property(ov::hint::model_distribution_policy({ov::hint::ModelDistributionPolicy::TENSOR_PARALLEL}));
ie.set_property(ov::hint::model_distribution_policy({ov::hint::ModelDistributionPolicy::PIPELINE_PARALLEL}));

static constexpr Property<bool> enable_cpu_pinning = {"ENABLE_CPU_PINNING"}#

This property allows CPU pinning during inference.

Developer can use this property to enable or disable CPU pinning during inference on Windows and Linux. MacOS does not support CPU pinning, and this property is always disabled. If user does not explicitly set value for this property, OpenVINO may choose any desired value based on internal logic.

The following is an example of CPU fixed behavior on a hybrid CPU (8 performance cores and 16 efficiency cores). For stream with 4 threads on performance cores, if CPU pinning is enabled, each thread is bound to a specific performance core. If CPU pinning is disabled, OS will schedule 4 threads on performance cores only. For stream with 24 threads on all cores, if CPU pinning is enabled, each thread is bound to a specific performance core. If CPU pinning is disabled, OS will schedule 24 threads on both performance cores and efficiency cores.

The following code is example to use this property.

ie.set_property(ov::hint::enable_cpu_pinning(true));
ie.set_property(ov::hint::enable_cpu_pinning(false));

static constexpr Property<bool> enable_cpu_reservation = {"ENABLE_CPU_RESERVATION"}#

This property allows CPU reservation during inference.

Cpu Reservation means reserve cpus which will not be used by other plugin or compiled model. Developer can use this property to enable or disable CPU reservation during inference on Windows and Linux. MacOS does not support CPU reservation, and this property is always disabled. This property defaults to false.

The following code is example to use this property.

ie.set_property(ov::hint::enable_cpu_reservation(true));
ie.set_property(ov::hint::enable_cpu_reservation(false));

static constexpr Property<bool> enable_hyper_threading = {"ENABLE_HYPER_THREADING"}#

This property define if using hyper threading during inference.

Developer can use this property to use or not use CPU pinning during inference. If user does not explicitly set value for this property, OpenVINO may choose any desired value based on internal logic.

The following code is example to use this property.

ie.set_property(ov::hint::enable_hyper_threading(true));
ie.set_property(ov::hint::enable_hyper_threading(false));

static constexpr Property<uint32_t> num_requests = {"PERFORMANCE_HINT_NUM_REQUESTS"}#: (Optional) property that backs the (above) Performance Hints by giving additional information on how many inference requests the application will be keeping in flight usually this value comes from the actual use-case (e.g. number of video-cameras, or other sources of inputs)

static constexpr Property<std::shared_ptr<const ov::Model>> model = {"MODEL_PTR"}#: This key identifies shared pointer to the ov::Model, required for some properties (ov::max_batch_size and ov::optimal_batch_size)

static constexpr Property<bool, PropertyMutability::RW> allow_auto_batching = {"ALLOW_AUTO_BATCHING"}#: Special key for auto batching feature configuration. Enabled by default.

static constexpr Property<ExecutionMode> execution_mode = {"EXECUTION_MODE_HINT"}#: High-level OpenVINO Execution hint unlike low-level properties that are individual (per-device), the hints are something that every device accepts and turns into device-specific settings Execution mode hint controls preferred optimization targets (performance or accuracy) for given model.

static constexpr Property<uint64_t, PropertyMutability::RW> dynamic_quantization_group_size{"DYNAMIC_QUANTIZATION_GROUP_SIZE"}#

This property defines group size for dynamic quantization optimization.

Dynamic quantization optimization provides an ability to get performance benefit from int8 compute. In contrast with static quantization dynamic approach assumes activations are quantized during inference. Despite the fact dynamic quantization has some runtime overheads, it might provide better accuracy metrics. This property defines granularity (aka block size) for dynamic quantization algorithms. Lower group size values might result in better accuracy, but the drawback is worse performance. Group size equal 0 means dynamic quantization optimization is disabled.

static constexpr Property<element::Type, PropertyMutability::RW> kv_cache_precision = {"KV_CACHE_PRECISION"}#: Hint for device to use specified precision for kv cache compression.

static constexpr Property<float, PropertyMutability::RW> activations_scale_factor = {"ACTIVATIONS_SCALE_FACTOR"}#: This property scales down activations to prevent overflows when inference precision is f16.

constexpr Property<Tensor, PropertyMutability::RW> compiled_blob = {"COMPILED_BLOB"}#

Hint for device to use model compiled blob.

The property is used pass compiled blob as ov::Tensor. The blob can be regular or weightless model. The weights_path property is hint where to look for weights.

static constexpr Property<bool> enable_profiling = {"PERF_COUNT"}#: The name for setting performance counters option.

static constexpr Property<Level> level = {"LOG_LEVEL"}#: the property for setting desirable log level.

static constexpr Property<std::string> cache_dir = {"CACHE_DIR"}#

This property defines the directory which will be used to store any data cached by plugins.

The underlying cache structure is not defined and might differ between OpenVINO releases Cached data might be platform / device specific and might be invalid after OpenVINO version change If this property is not specified or value is empty string, then caching is disabled. The property might enable caching for the plugin using the following code:

ie.set_property("GPU", ov::cache_dir("cache/")); // enables cache for GPU plugin

The following code enables caching of compiled network blobs for devices where import/export is supported

ie.set_property(ov::cache_dir("cache/")); // enables models cache

static constexpr Property<bool, PropertyMutability::RO> loaded_from_cache = {"LOADED_FROM_CACHE"}#: Read-only property to notify user that compiled model was loaded from the cache.

static constexpr Property<WorkloadType, PropertyMutability::RW> workload_type = {"WORKLOAD_TYPE"}#: Read-write property to select in which mode the workload will be executed This is only supported by NPU.

static constexpr Property<CacheMode, PropertyMutability::RW> cache_mode = {"CACHE_MODE"}#: Read-write property to select the cache mode between optimize_size and optimize_speed. If optimize_speed is selected(default), loading time will decrease but the cache file size will increase. If optimize_size is selected, smaller cache files will be created. This is only supported from GPU.

static constexpr Property<EncryptionCallbacks, PropertyMutability::WO> cache_encryption_callbacks{"CACHE_ENCRYPTION_CALLBACKS"}#

Write-only property to set encryption/decryption function for saving/loading model cache. If cache_encryption_callbacks is set, the model topology will be encrypted when saving to the cache and decrypted when loading from the cache. This property is set in core.compile_model only.

First value of the struct is encryption function.
Second value of the struct is decryption function.

Note

GPU Plugin: encrypts whole blob, not only model structure. Only used when ov::cache_mode property is set to “OPTIMIZE_SIZE”.

static constexpr Property<std::tuple<unsigned int, unsigned int>, PropertyMutability::RO> range_for_streams{"RANGE_FOR_STREAMS"}#

Read-only property to provide information about a range for streams on platforms where streams are supported.

Property returns a value of std::tuple<unsigned int, unsigned int> type, where:

First value is bottom bound.
Second value is upper bound.

static constexpr Property<unsigned int, PropertyMutability::RO> optimal_batch_size = {"OPTIMAL_BATCH_SIZE"}#

Read-only property to query information optimal batch size for the given device and the network.

Property returns a value of unsigned int type, Returns optimal batch size for a given network on the given device. The returned value is aligned to power of 2. Also, ov::hint::model is the required option for this metric since the optimal batch size depends on the model, so if the ov::hint::model is not given, the result of the metric is always 1. For the GPU the metric is queried automatically whenever the OpenVINO performance hint for the throughput is used, so that the result (>1) governs the automatic batching (transparently to the application). The automatic batching can be disabled with ALLOW_AUTO_BATCHING set to NO

static constexpr Property<uint32_t, PropertyMutability::RO> max_batch_size = {"MAX_BATCH_SIZE"}#: Read-only property to get maximum batch size which does not cause performance degradation due to memory swap impact.

static constexpr Property<uint32_t, PropertyMutability::RW> auto_batch_timeout = {"AUTO_BATCH_TIMEOUT"}#: Read-write property to set the timeout used to collect the inputs for the auto-batching impact.

static constexpr Property<std::tuple<unsigned int, unsigned int, unsigned int>, PropertyMutability::RO> range_for_async_infer_requests = {"RANGE_FOR_ASYNC_INFER_REQUESTS"}#

Read-only property to provide a hint for a range for number of async infer requests. If device supports streams, the metric provides range for number of IRs per stream.

Property returns a value of std::tuple<unsigned int, unsigned int, unsigned int> type, where:

First value is bottom bound.
Second value is upper bound.
Third value is step inside this range.

static constexpr Property<bool, PropertyMutability::RW> force_tbb_terminate = {"FORCE_TBB_TERMINATE"}#

Read-write property to set whether force terminate tbb when ov core destruction value type: boolean.

True explicitly terminate tbb when ov core destruction
False will not involve additional tbb operations when core destruction

static constexpr Property<bool, PropertyMutability::RW> enable_mmap = {"ENABLE_MMAP"}#

Read-write property to configure mmap() use for model read. Enabled by default. For the moment only IR Frontend supports the property.

value type: boolean

True enable mmap() use and map model
False disable mmap() use and read model

static constexpr Property<std::string> id = {"DEVICE_ID"}#: the property for setting of required device to execute on values: device id starts from “0” - first device, “1” - second device, etc

static constexpr Priorities priorities = {"MULTI_DEVICE_PRIORITIES"}#: Device Priorities config option, with comma-separated devices listed in the desired priority.

static constexpr Properties properties = {"DEVICE_PROPERTIES"}#

Property to pass set of property values to specified device

Usage Example:

core.compile_model("HETERO"
    ov::device::priorities("GPU", "CPU"),
    ov::device::properties("CPU", ov::enable_profiling(true)),
    ov::device::properties("GPU", ov::enable_profiling(false)));

static constexpr Property<std::string, PropertyMutability::RO> full_name = {"FULL_DEVICE_NAME"}#: Read-only property to get a std::string value representing a full device name.

static constexpr Property<std::string, PropertyMutability::RO> architecture = {"DEVICE_ARCHITECTURE"}#: Read-only property which defines the device architecture.

static constexpr Property<UUID, PropertyMutability::RO> uuid = {"DEVICE_UUID"}#: Read-only property which defines the UUID of the device.

static constexpr Property<LUID, PropertyMutability::RO> luid = {"DEVICE_LUID"}#: Read-only property which defines the LUID of the device.

static constexpr Property<Type, PropertyMutability::RO> type = {"DEVICE_TYPE"}#: Read-only property to get a type of device. See Type enum definition for possible return values.

static constexpr Property<std::map<element::Type, float>, PropertyMutability::RO> gops = {"DEVICE_GOPS"}#: Read-only property which defines Giga OPS per second count (GFLOPS or GIOPS) for a set of precisions supported by specified device.

static constexpr Property<PCIInfo, PropertyMutability::RO> pci_info = {"DEVICE_PCI_INFO"}#: Read-only property to get PCI bus information of device. See PCIInfo struct definition for details.

static constexpr Property<float, PropertyMutability::RO> thermal = {"DEVICE_THERMAL"}#: Read-only property to get a float of device thermal.

static constexpr Property<std::vector<std::string>, PropertyMutability::RO> capabilities = {"OPTIMIZATION_CAPABILITIES"}#: Read-only property to get a std::vector<std::string> of capabilities options per device.

static constexpr const auto FP32 = "FP32"#: Device supports fp32 inference.

static constexpr const auto BF16 = "BF16"#: Device supports bf16 inference.

static constexpr const auto FP16 = "FP16"#: Device supports fp16 inference.

static constexpr const auto INT8 = "INT8"#: Device supports int8 inference.

static constexpr const auto INT16 = "INT16"#: Device supports int16 inference.

static constexpr const auto BIN = "BIN"#: Device supports binary inference.

static constexpr const auto WINOGRAD = "WINOGRAD"#: Device supports winograd optimization.

static constexpr const auto EXPORT_IMPORT = "EXPORT_IMPORT"#: Device supports compiled model export and import.

static constexpr Property<Num, PropertyMutability::RW> num = {"NUM_STREAMS"}#: The number of executor logical partitions.

static constexpr Num AUTO = {-1}#: Creates bare minimum of streams to improve the performance.

static constexpr Num NUMA = {-2}#: Creates as many streams as needed to accommodate NUMA and avoid associated penalties.

static constexpr Property<streams::Num, PropertyMutability::RW> num_streams = {"NUM_STREAMS"}#: The number of executor logical partitions.

static constexpr Property<int32_t, PropertyMutability::RW> inference_num_threads = {"INFERENCE_NUM_THREADS"}#: Maximum number of threads that can be used for inference tasks.

static constexpr Property<int32_t, PropertyMutability::RW> compilation_num_threads = {"COMPILATION_NUM_THREADS"}#: Maximum number of threads that can be used for compilation tasks.

static constexpr Property<std::vector<std::string>, PropertyMutability::RO> execution_devices = {"EXECUTION_DEVICES"}#: The devices that the inference task been executed.

static constexpr Property<element::Type, PropertyMutability::RW> key_cache_precision = {"KEY_CACHE_PRECISION"}#: The precision of key cache compression.

static constexpr Property<element::Type, PropertyMutability::RW> value_cache_precision = {"VALUE_CACHE_PRECISION"}#: The precision of value cache compression.

static constexpr Property<uint64_t, PropertyMutability::RW> key_cache_group_size = {"KEY_CACHE_GROUP_SIZE"}#: The group_size of key cache compression.

static constexpr Property<uint64_t, PropertyMutability::RW> value_cache_group_size = {"VALUE_CACHE_GROUP_SIZE"}#: The group_size of value cache compression.

struct Priorities : public ov::Property<std::string>#

#include <properties.hpp>

Type for device Priorities config option, with comma-separated devices listed in the desired priority.

Public Functions

template<typename ...Args> inline std::pair<std::string, Any> operator()(Args&&... args) const#

Constructs device priorities.

Template Parameters:: Args – property constructor arguments types
Parameters:: args – property constructor arguments
Returns:: Pair of name and type erased value.

struct Properties : public ov::Property<std::map<std::string, std::map<std::string, Any>>>#

#include <properties.hpp>

Type for property to pass set of properties to specified device.

Public Functions

inline std::pair<std::string, Any> operator()(const AnyMap &config) const#

Constructs property.

Parameters:: configs – set of property values with names
Returns:: Pair of string key representation and type erased property value.

inline std::pair<std::string, Any> operator()(const std::string &device_name, const AnyMap &config) const#

Constructs property.

Parameters:

device_name – device plugin alias
config – set of property values with names

Returns:

Pair of string key representation and type erased property value.

template<typename ...Properties> inline util::EnableIfAllStringAny<std::pair<std::string, Any>, Properties...> operator()(const std::string &device_name, Properties&&... configs) const#

Constructs property.

Template Parameters:

Properties – Should be the pack of std::pair<std::string, ov::Any> types

Parameters:

device_name – device plugin alias
configs – Optional pack of pairs: (config parameter name, config parameter value)

Returns:

Pair of string key representation and type erased property value.

struct UUID#

#include <properties.hpp>

Structure which defines format of UUID.

Public Members

std::array<uint8_t, MAX_UUID_SIZE> uuid#: Array with uuid for a device.

Public Static Attributes

static const uint64_t MAX_UUID_SIZE = 16#: Max size of uuid array (128 bits)

struct LUID#

#include <properties.hpp>

Structure which defines format of LUID.

Public Members

std::array<uint8_t, MAX_LUID_SIZE> luid#: Array with luid for a device.

Public Static Attributes

static const uint64_t MAX_LUID_SIZE = 8#: Max size of luid array (64 bits)

struct PCIInfo#

#include <properties.hpp>

Structure to store PCI bus information of device (Domain/Bus/Device/Function)

Public Members

uint32_t domain#: PCI domain ID.

uint32_t bus#: PCI bus ID.

uint32_t device#: PCI device ID.

uint32_t function#: PCI function ID.

struct Num#

#include <properties.hpp>

Class to represent number of streams in streams executor.

Public Types

using Base = std::tuple<int32_t>#: NumStreams is representable as int32_t.