The GPU codepath abstracts many details about OpenCL™. You need to provide the kernel code in OpenCL C and the configuration file that connects the kernel and its parameters to the parameters of the layer.
There are two options of using custom layer configuration file:
cldnn_global_custom_kernels/cldnn_global_custom_kernels.xml
file, which is hosted in the <INSTALL_DIR>/deployment_tools/inference_engine/bin/intel64/{Debug/Release}
folderInferenceEngine::Core::SetConfig()
method from your application with the InferenceEngine::PluginConfigParams::KEY_CONFIG_FILE
key and the configuration file name as a value before loading the network that uses custom layers to the plugin: All Inference Engine samples, except trivial hello_classification
, feature a dedicated command-line option -c
to load custom kernels. For example, to load custom layers for the classification sample, run the command below:
The configuration file is expected to follow the .xml
file structure with a node of the type CustomLayer
for every custom layer you provide.
The definitions described in the sections below use the following notations:
Notation | Description |
---|---|
(0/1) | Can have 0 or 1 instances of this node/attribute |
(1) | Must have only 1 instance of this node/attribute |
(0+) | Can have any number of instances of this node/attribute |
(1+) | Can have 1 or more instances of this node/attribute |
CustomLayer
node contains the entire configuration for a single custom layer.
Attribute Name | # | Description |
---|---|---|
name | (1) | The name of the layer type to be used. This name should be identical to the type used in the IR. |
type | (1) | Must be SimpleGPU . |
version | (1) | Must be 1 . |
Sub-nodes: Kernel
(1), Buffers
(1), CompilerOptions
(0+), WorkSizes
(0/1)
Kernel
node contains all kernel source code configuration. No kernel node structure exists.
Sub-nodes: Source
(1+), Define
(0+)
Source
node points to a single OpenCL source file.
Attribute Name | # | |
---|---|---|
filename | (1) | Name of the file containing OpenCL source code. Notice that path is relative to your executable. Multiple source nodes will have their sources concatenated in order. |
Sub-nodes: None
Define
node configures a single #define
instruction to be added to the sources during compilation (JIT).
Attribute Name | # | Description |
---|---|---|
name | (1) | The name of the defined JIT. For static constants, this can include the value as well (taken as a string). |
param | (0/1) | This parameter value is used as the value of this JIT definition. |
type | (0/1) | The parameter type. Accepted values: int , float , and int[] , float[] for arrays. |
default | (0/1) | The default value to be used if the specified parameters is missing from the layer in the IR. |
Sub-nodes: None
The resulting JIT has the following form: #define [name] [type] [value/default]
.
Buffers
node configures all input/output buffers for the OpenCL entry function. No buffers node structure exists.
Sub-nodes: Data
(0+), Tensor
(1+)
Data
node configures a single input with static data (for example, weights or biases).
Attribute Name | # | Description |
---|---|---|
name | (1) | Name of a blob attached to a layer in the IR |
arg-index | (1) | 0-based index in the entry function arguments to be bound to |
Sub-nodes: None
Tensor
node configures a single input or output tensor.
Attribute Name | # | Description |
---|---|---|
arg-index | (1) | 0-based index in the entry function arguments to be bound to. |
type | (1) | input or output |
port-index | (1) | 0-based index in the layer’s input/output ports in the IR |
format | (0/1) | Data layout declaration for the tensor. Accepted values: BFYX , BYXF , YXFB , FYXB (also in all lowercase). Default value: BFYX |
CompilerOptions
node configures the compilation flags for the OpenCL sources.
Attribute Name | # | Description |
---|---|---|
options | (1) | Options string to be passed to the OpenCL compiler |
Sub-nodes: None
WorkSizes
node configures the global/local work sizes to be used when queuing the OpenCL program for execution.
Attribute Name | # | Description |
---|---|---|
global local | (0/1) (0/1) | An array of up to 3 integers (or formulas) for defining the OpenCL work-sizes to be used during execution. The formulas can use the values of the B,F,Y,X dimensions and contain the operators: +,-,/,*,% (all evaluated in integer arithmetic). Default value: global=”B*F*Y*X” local=”” |
dim | (0/1) | A tensor to take the work size from. Accepted values: input N , output , where N is an index of input tensor starting with 0. Default value: output |
Sub-nodes: None
The following code sample provides an example configuration file (in the .xml
format). For information on configuration file structure, see Configuration File Format.
The following table includes definitions that are attached before the user sources, where <TENSOR>
is the actual input and output, for example, INPUT0
or OUTPUT0
.
For an example, see Example Kernel.
Name | Value |
---|---|
NUM_INPUTS | Number of the input tensors bound to this kernel |
GLOBAL_WORKSIZE | An array of global work sizes used to execute this kernel |
GLOBAL_WORKSIZE_SIZE | The size of the GLOBAL_WORKSIZE array |
LOCAL_WORKSIZE | An array of local work sizes used to execute this kernel |
LOCAL_WORKSIZE_SIZE | The size of the LOCAL_WORKSIZE array |
<TENSOR>_DIMS | An array of the tensor dimension sizes. Always ordered as BFYX |
<TENSOR>_DIMS_SIZE | The size of the <TENSOR>_DIMS array. |
<TENSOR>_TYPE | The datatype of the tensor: float , half , or char |
<TENSOR>_FORMAT_ | The format of the tensor, BFYX, BYXF, YXFB , FYXB, or ANY. The format is concatenated to the defined name. You can use the tensor format to define codepaths in your code with #ifdef/#endif . |
<TENSOR>_LOWER_PADDING | An array of padding elements used for the tensor dimensions before they start. Always ordered as BFYX. |
<TENSOR>_ LOWER_PADDING_SIZE | The size of the <TENSOR>_LOWER_PADDING array |
<TENSOR>_UPPER_PADDING | An array of padding elements used for the tensor dimensions after they end. Always ordered as BFYX. |
<TENSOR>_UPPER_PADDING_SIZE | The size of the <TENSOR>_UPPER_PADDING array |
<TENSOR>_PITCHES | The number of elements between adjacent elements in each dimension. Always ordered as BFYX. |
<TENSOR>_PITCHES_SIZE | The size of the <TENSOR>_PITCHES array |
<TENSOR>_OFFSET | The number of elements from the start of the tensor to the first valid element (bypassing the lower padding) |
All <TENSOR>
values are automatically defined for every tensor bound to this layer (INPUT0
, INPUT1
, OUTPUT0
, and so on), as shown in the following for example:
NOTE: As described in the previous section, all the things like
INPUT0_TYPE
are actually defined as OpenCL (pre-)compiler inputs by the Inference Engine for efficiency reasons. See Debugging Tips for information on debugging the results.
NOTE: Several GPU-targeted kernels are also added to the binaries upon samples compilation so that the sample application can easy load them. Refer to the
cldnn_global_custom_kernels
folder in the GPU plugin installation directory.
When the Inference Engine compiles the kernels for the specific network, it also outputs the resulting code for the custom kernels. In the directory of your executable, find files like clDNN_program0.cl
, clDNN_program1.cl
. There are as many files as distinct sets of parameters for your custom kernel: different input tensor sizes and kernel parameters.
printf
in the OpenCL™ Kernels. To debug the specific values, you can use printf
in your kernels. However, be careful: for instance, do not output excessively as it would generate too much data. The printf
output is typical, so your output can be truncated to fit the buffer. Also, because of buffering, you actually get an entire buffer of output when the execution ends.