GPU Kernels Tuning allows you to tune models, so the heavy computational layers* are configured to fit better into hardware, which the tuning was done on. It is required to achieve best performance on GPU. DLDT releases includes pretuned data (cache.json
- it is located in IE's binaries folder) for current state of the art models. It is highly recommended to do the tuning for new kind of models, hardwares or drivers.
GPU tuning data is saved in JSON format. File's content is composed of 2 types of attributes and 1 type of value:
You can activate Kernels Tuning process by setting KEY_TUNING_MODE
flag to TUNING_CREATE
and KEY_TUNING_FILE
to <"filename">
in a configuration map that is passed to the plugin while loading a network. This configuration modifies the behavior of the ExecutableNetwork
object. Instead of standard network compilation, it will run the tuning process. Please keep in mind that the tuning can be very time consuming. The bigger the network, the longer it will take. File with tuned data is the result of this step*.
KEY_TUNING_FILE
points to existing tuned data and user is tuning new model, then this file will be extended by new data. This allows users to extened existing cache.json
provided in DLDT release package. You can activate the inference with tuned data by setting KEY_TUNING_MODE
flag to TUNING_USE_EXISTING
and KEY_TUNING_FILE
flag to <"filename">
.
GPU backend will process the content of the file during network compilation to configure the OpenCL kernels for the best performance.