Intermediate Representation Notation Reference Catalog

Table of Сontents

Activation Layer

Back to top

Name: Activation

Category: Activation

Short description: Activation layer represents an activation function of each neuron in a layer, which is used to add non-linearity to the computational flow.

Detailed description: Reference

Parameters: Activation layer parameters are specified in the data node, which is a child of the layer node.

Mathematical Formulation

Inputs:

Example

<layer ... type="Activation" ... >
<data type="sigmoid" />
<input> ... </input>
<output> ... </output>
</layer>

ArgMax Layer

Back to top

Name: ArgMax

Category: Layer

Short description: ArgMax layer computes indexes and values of the top_k maximum values for each datum across all dimensions CxHxW.

Detailed description: ArgMax layer is used after a classification layer to produce a prediction. If the parameter out_max_val is 1, output is a vector of pairs (max_ind, max_val) for each batch. The axis parameter specifies an axis along which to maximize.

Parameters: ArgMax layer parameters are specified in the data node, which is a child of the layer node.

Inputs:

Mathematical Formulation

ArgMax generally does the following with the input blobs:

\[ o_{i} = \left\{ x| x \in S \wedge \forall y \in S : f(y) \leq f(x) \right\} \]

Example

<layer ... type="ArgMax" ... >
<data top_k="10" out_max_val="1" axis="-1"/>
<input> ... </input>
<output> ... </output>
</layer>

BatchNormalization Layer

Back to top

Name: BatchNormalization

Category: Normalization

Short description: Reference

Detailed description: Reference

Parameters: BatchNormalization layer parameters are specified in the data node, which is a child of the layer node.

Inputs:

Mathematical Formulation

BatchNormalization normalizes the output in each hidden layer.

Example

<layer ... type="BatchNormalization" ... >
<data epsilon="9.99e-06" />
<input> ... </input>
<output> ... </output>
</layer>

BinaryConvolution Layer

Back to top

Name: BinaryConvolution

Category: Layer

Short description: BinaryConvolution convolution with binary weights

Parameters: BinaryConvolution layer parameters are specified in the data node, which is a child of the layer node. The layer has the same parameters as a regular Convolution layer and several unique parameters.

Inputs:


Clamp Layer

Back to top

Name: Clamp

Category: Layer

Short description: Clamp layer represents clipping activation operation.

Detailed description: Reference

Parameters: Clamp layer parameters are specified in the data node, which is a child of the layer node.

Inputs:

Mathematical Formulation

Clamp generally does the following with the input blobs:

\[ out_i=\left\{\begin{array}{ll} max\_value \quad \mbox{if } \quad input_i>max\_value \\ min\_value \quad \mbox{if } \quad input_i \end{array}\right. \]

Example

<layer ... type="Clamp" ... >
<data min="10" max="50" />
<input> ... </input>
<output> ... </output>
</layer>

Concat Layer

Back to top

Name: Concat

Category: Layer

Short description: Reference

Parameters: Concat layer parameters are specified in the data node, which is a child of the layer node.

Inputs:

Mathematical Formulation

Axis parameter specifies a blob dimension to concatenate values over. For example, for two input blobs B1xC1xH1xW1 and B2xC2xH2xW2, if axis="1", the output blob is B1xC1+C2xH1xW1. This is only possible if B1=B2, H1=H2, W1=W2.

Example

<layer ... type="Concat" ... >
<data axis="1"/>
<input> ... </input>
<output> ... </output>
</layer>

Const Layer

Back to top

Name: Const

Category: Layer

Short description: Const layer produces a blob with a constant value specified in the blobs section.

Parameters: Const layer does not have parameters.

Example

<layer ... type="Const" ...>
<output>
<port id="1">
<dim>3</dim>
<dim>100</dim>
</port>
</output>
<blobs>
<custom offset="..." size="..."/>
</blobs>
</layer>

Convolution Layer

Back to top

Name: Convolution

Category: Layer

Short description: Reference

Detailed description: Reference

Parameters: Convolution layer parameters are specified in the data node, which is a child of the layer node.

Inputs:

Weights Layout

Weights layout is GOIYX (GOIZYX for 3D convolution), which means that X changes the fastest, then Y, Input and Output, Group.

Mathematical Formulation

Example

<layer ... type="Convolution" ... >
<data auto_pad="same_upper" dilations="1,1" group="3" kernel="7,7" output="24" pads_begin="2,2" pads_end="3,3" strides="2,2"/>
<input> ... </input>
<output> ... </output>
<weights ... />
<biases ... />
</layer>

Crop (Type 1) Layer

Back to top

Name: Crop

Category: Layer

Short description: Crop layer changes selected dimensions of the input blob according to the specified parameters.

Parameters: Crop layer parameters are specified in the data section, which is a child of the layer node. Crop Type 1 layer takes two input blobs, and the shape of the second blob specifies the Crop size. The Crop layer of this type supports shape inference.

Inputs

Example

<layer id="39" name="score_pool4c" precision="FP32" type="Crop">
<data axis="2,3" offset="0,0"/>
<input>
<port id="0">
<dim>1</dim>
<dim>21</dim>
<dim>44</dim>
<dim>44</dim>
</port>
<port id="1">
<dim>1</dim>
<dim>21</dim>
<dim>34</dim>
<dim>34</dim>
</port>
</input>
<output>
<port id="2">
<dim>1</dim>
<dim>21</dim>
<dim>34</dim>
<dim>34</dim>
</port>
</output>
</layer>

Crop (Type 2) Layer

Back to top

Name: Crop

Category: Layer

Short description: Crop layer changes selected dimensions of the input blob according to the specified parameters.

Parameters: Specify parameters for the Crop layer in the data section, which is a child of the layer node. Crop Type 2 layer takes one input blob to crop. The Crop layer of this type supports shape inference only when shape propagation is applied to dimensions not specified in the axis attribute.

Example

<layer id="39" name="score_pool4c" precision="FP32" type="Crop">
<data axis="2,3" offset="0,0" dim="34,34"/>
<input>
<port id="0">
<dim>1</dim>
<dim>21</dim>
<dim>44</dim>
<dim>44</dim>
</port>
</input>
<output>
<port id="1">
<dim>1</dim>
<dim>21</dim>
<dim>34</dim>
<dim>34</dim>
</port>
</output>
</layer>

Crop (Type 3) Layer

Back to top

Name: Crop

Category: Layer

Short description: Crop layer changes selected dimensions of the input blob according to the specified parameters.

Parameters: Crop layer parameters are specified in the data section, which is a child of the layer node. Crop Type 3 layer takes one input blob to crop. The Crop layer of this type supports shape inference.

Example

<layer id="39" name="score_pool4c" precision="FP32" type="Crop">
<data axis="2,3" crop_begin="4,4" crop_end="6,6"/>
<input>
<port id="0">
<dim>1</dim>
<dim>21</dim>
<dim>44</dim>
<dim>44</dim>
</port>
</input>
<output>
<port id="1">
<dim>1</dim>
<dim>21</dim>
<dim>34</dim>
<dim>34</dim>
</port>
</output>
</layer>

CTCGreedyDecoder Layer

Back to top

Name: CTCGreedyDecoder

Category: Layer

Short description: CTCGreedyDecoder performs greedy decoding on the logits given in input (best path).

Detailed description: Reference

Parameters: CTCGreedyDecoder layer parameters are specified in the data node, which is a child of the layer node.

Mathematical Formulation

Given an input sequence $X$ of length $T$, CTCGreadyDecoder assumes the probability of a length $T$ character sequence $C$ is given by

\[ p(C|X) = \prod_{t=1}^{T} p(c_{t}|X) \]

Example

<layer ... type="CTCGreadyDecoder" ... >
<data stride="1"/>
<input> ... </input>
<output> ... </output>
</layer>

Deconvolution Layer

Back to top

Name: Deconvolution

Category: Layer

Short description: Deconvolution layer is applied for upsampling the output to the higher image resolution.

Detailed description: Reference

Parameters: Deconvolution layer parameters should be specified in the data node, which is a child of the layer node.

Inputs:

Weights Layout

Weights layout is GOIYX, which means that X changes the fastest, then Y, Input and Output, Group.

Mathematical Formulation

Deconvolution is also called transpose convolution and performs operation that is reverse to convolution. The number of output features for each dimensions is calculated as:

\[S_{o}=stride(S_{i} - 1 ) + S_{f} - 2pad \]

Where $S$ is the size of output, input, and filter. Output is calculated in the same way as for convolution layer:

\[out = \sum_{i = 0}^{n}w_{i}x_{i} + b\]

Example

<layer ... type="Deconvolution" ...>
<data auto_pad="valid" kernel="2,2,2" output="512" pads_begin="0,0,0" pads_end="0,0,0" strides="2,2,2"/>
<input>
<port id="0">
<dim>1</dim>
<dim>512</dim>
<dim>8</dim>
<dim>8</dim>
<dim>8</dim>
</port>
</input>
<output>
<port id="3">
<dim>1</dim>
<dim>512</dim>
<dim>16</dim>
<dim>16</dim>
<dim>16</dim>
</port>
</output>
<blobs>
<weights offset="..." size="..."/>
<biases offset="..." size="..."/>
</blobs>
</layer>

DepthToSpace Layer

Back to top

Name: DepthToSpace

Category: Layer

Short description: DepthToSpace layer rearranges data from the depth dimension of the input blob into spatial dimensions.

Detailed description: DepthToSpace layer outputs a copy of the input blob, where values from the depth dimension (features) are moved to spatial blocks. Refer to the ONNX* specification for an example of the 4D input blob case.

Parameters: DepthToSpace layer parameters are specified parameters in the data node, which is a child of the layer node.

Inputs:

Mathematical Formulation

The operation is equivalent to the following transformation of the input blob x with K spatial dimensions of shape [N, C, D1, D2, D3 , ... , DK]:

x' = reshape(x, [N, block_size, block_size, ... , block_size, D1 * block_size, D2 * block_size, ... Dk * block_size])
x'' = transpose(x', [0, K + 1, K + 2, 1, K + 3, 2, K + 4, 3, ... K + K + 1, K])
y = reshape(x'', [N, C / block_size ^ K, D1 * block_size, D2 * block_size, D3 * block_size, ... , DK * block_size])

Example

<layer ... type="DepthToSpace">
<data block_size="2"/>
<input>
<port id="0">
<dim>5</dim>
<dim>4</dim>
<dim>2</dim>
<dim>3</dim>
</port>
</input>
<output>
<port id="1">
<dim>5</dim>
<dim>1</dim>
<dim>4</dim>
<dim>6</dim>
</port>
</output>
</layer>

DetectionOutput Layer

Back to top

Name: DetectionOutput

Category: Layer

Short description: DetectionOutput layer performs non-maximum suppression to generate the detection output using information on location and confidence predictions.

Detailed description: Reference. The layer has three required inputs: blob with box logits, blob with confidence predictions, and blob with box coordinates (proposals). It can have two additional inputs with additional confidence predictions and box coordinates described in the article. The five input version of the layer is supported with MYRIAD plugin only. The output blob contains information about filtered detections described with seven element tuples: [batch_id, class_id, confidence, x_1, y_1, x_2, y_2]. The first tuple with batch_id equal to -1 means end of output.

Parameters: DetectionOutput layer parameters are specified in the data node, which is a child of the layer node.

Inputs:

Mathematical Formulation

At each feature map cell, DetectionOutput predicts the offsets relative to the default box shapes in the cell, as well as the per-class scores that indicate the presence of a class instance in each of those boxes. Specifically, for each box out of k at a given location, DetectionOutput computes class scores and the four offsets relative to the original default box shape. This results are a total of $(c + 4)k$ filters that are applied around each location in the feature map, yielding $(c + 4)kmn$ outputs for a m * n feature map.

Example

<layer ... type="DetectionOutput" ... >
<data num_classes="21" share_location="1" background_label_id="0" nms_threshold="0.450000" top_k="400" input_height="1" input_width="1" code_type="caffe.PriorBoxParameter.CENTER_SIZE" variance_encoded_in_target="0" keep_top_k="200" confidence_threshold="0.010000"/>
<input> ... </input>
<output> ... </output>
</layer>

Eltwise Layer

Back to top

Name: Eltwise

Category: Layer

Short description: Eltwise layer performs element-wise operation specified in parameters, over given inputs.

Parameters: Eltwise layer parameters are specified in the data node, which is a child of the layer node. Eltwise accepts two inputs of arbitrary number of dimensions. The operation supports broadcasting input blobs according to the NumPy specification.

Inputs

Mathematical Formulation Eltwise does the following with the input blobs:

\[ o_{i} = f(b_{i}^{1}, b_{i}^{2}) \]

where $b_{i}^{1}$ - first blob $i$-th element, $b_{i}^{2}$ - second blob $i$-th element, $o_{i}$ - output blob $i$-th element, $f(a, b)$ - is a function that performs an operation over its two arguments $a, b$.

Example

<layer ... type="Eltwise" ... >
<data operation="sum"/>
<input> ... </input>
<output> ... </output>
</layer>

Fill Layer

Back to top

Name: Fill

Category: Layer

Short description: Fill layer generates a blob of the specified shape filled with the specified value.

Parameters: Fill layer does not have parameters.

Inputs:

Example

<layer ... type="Fill">
<input>
<port id="0">
<dim>2</dim>
</port>
<port id="1"/>
</input>
<output>
<port id="2">
<dim>3</dim>
<dim>4</dim>
</port>
</output>
</layer>

Flatten Layer

Back to top

Name: Flatten

Category: Layer

Short description: Flatten layer performs flattening of specific dimensions of the input blob.

Parameters: Flatten layer parameters are specified in the data node, which is a child of the layer node.

Inputs

Example

<layer ... type="Flatten" ...>
<data axis="1" end_axis="-1"/>
<input>
<port id="0">
<dim>7</dim>
<dim>19</dim>
<dim>19</dim>
<dim>12</dim>
</port>
</input>
<output>
<port id="1">
<dim>7</dim>
<dim>4332</dim>
</port>
</output>
</layer>

FullyConnected Layer

Back to top

Name: FullyConnected

Category: Layer

Short description: Reference

Detailed description: Reference

Parameters: FullyConnected layer parameters are specified in the data node, which is a child of the layer node.

Inputs

Weights Layout

OI, which means that Input changes the fastest, then Output.

Mathematical Formulation

Example

<layer ... type="FullyConnected" ... >
<data out-size="4096"/>
<input> ... </input>
<output> ... </output>
</layer>

Gather Layer

Back to top

Name: Gather

Category: Layer

Short description: Gather layer takes slices of data in the second input blob according to the indexes specified in the first input blob. The output blob shape is input2.shape[:axis] + input1.shape + input2.shape[axis + 1:].

Parameters: Gather layer parameters are specified in the data node, which is a child of the layer node.

Mathematical Formulation

\[ output[:, ... ,:, i, ... , j,:, ... ,:] = input2[:, ... ,:, input1[i, ... ,j],:, ... ,:] \]

Inputs

Example

<layer id="1" name="gather_node" precision="FP32" type="Gather">
<data axis=1 />
<input>
<port id="0">
<dim>15</dim>
<dim>4</dim>
<dim>20</dim>
<dim>28</dim>
</port>
<port id="1">
<dim>6</dim>
<dim>12</dim>
<dim>10</dim>
<dim>24</dim>
</port>
</input>
<output>
<port id="2">
<dim>6</dim>
<dim>15</dim>
<dim>4</dim>
<dim>20</dim>
<dim>28</dim>
<dim>10</dim>
<dim>24</dim>
</port>
</output>
</layer>

GRN Layer

Back to top

Name: GRN

Category: Normalization

Short description: GRN is the Global Response Normalization with L2 norm (across channels only).

Parameters: GRN layer parameters are specified in the data node, which is a child of the layer node.

Inputs

Mathematical Formulation

GRN computes the L2 norm by channels for input blob. GRN generally does the following with the input blob:

\[ output_{i} = \frac{input_{i}}{\sqrt{\sum_{i}^{C} input_{i}}} \]

Example

<layer ... type="GRN" ... >
<data bias="1.0"/>
<input> ... </input>
<output> ... </output>
</layer>

GRUCell Layer

Back to top

Name: GRUCell

Category: Layer

Short description: GRUCell layer computes the output using the formula described in the paper.

Parameters: GRUCell layer parameters are specified in the data node, which is a child of the layer node.

Inputs

Outputs

Example

<layer type="GRUCell">
<data hidden_size="128" linear_before_reset="1"/>
<input>
<port id="0">
<dim>1</dim>
<dim>16</dim>
</port>
<port id="1">
<dim>1</dim>
<dim>128</dim>
</port>
</input>
<output>
<port id="4">
<dim>1</dim>
<dim>128</dim>
</port>
</output>
<blobs>
<weights offset="…" size="…"/>
<biases offset="…" size="…"/>
</blobs>
</layer>

Input Layer

Back to top

Name: Input

Category: Layer

Short description: Input layer specifies input to the model.

Parameters: Input layer does not have parameters.

Example

<layer ... type="Input" ...>
<output>
<port id="0">
<dim>1</dim>
<dim>3</dim>
<dim>224</dim>
<dim>224</dim>
</port>
</output>
</layer>

Interp Layer

Back to top

Name: Interp

Category: Layer

Short description: Interp layer performs bilinear interpolation of the input blob by the specified parameters.

Parameters: Interp layer parameters are specified in the data node, which is a child of the layer node.

Inputs

Example

<layer ... type="Interp" ...>
<data align_corners="0" pad_beg="0" pad_end="0"/>
<input>
<port id="0">
<dim>1</dim>
<dim>2</dim>
<dim>48</dim>
<dim>80</dim>
</port>
</input>
<output>
<port id="1">
<dim>1</dim>
<dim>2</dim>
<dim>96</dim>
<dim>160</dim>
</port>
</output>
</layer>

LSTMCell Layer

Back to top

Name: LSTMCell

Category: Layer

Short description: LSTMCell layer computes the output using the formula described in the original paper Long Short-Term Memory.

Parameters: LSTMCell layer parameters are specified in the data node, which is a child of the layer node.

Inputs

Outputs

Mathematical Formulation

Formula:
* - matrix mult
(.) - eltwise mult
[,] - concatenation
sigm - 1/(1 + e^{-x})
tanh - (e^{2x} - 1)/(e^{2x} + 1)
f = sigm(Wf*[Hi, X] + Bf)
i = sigm(Wi*[Hi, X] + Bi)
c = tanh(Wc*[Hi, X] + Bc)
o = sigm(Wo*[Hi, X] + Bo)
Co = f (.) Ci + i (.) c
Ho = o (.) tanh(Co)

Example

<layer ... type="LSTMCell" ... >
<input> ... </input>
<output> ... </output>
</layer>

Memory Layer

Back to top

Name: Memory

Category: Layer

Short description: Memory layer represents the delay layer in terms of LSTM terminology. For more information about LSTM topologies, please refer to this article.

Detailed description: Memory layer saves the state between two infer requests. In the topology, it is the single layer, however, in the Intermediate Representation, it is always represented as a pair of Memory layers. One of these layers does not have outputs and another does not have inputs (in terms of the Intermediate Representation).

Parameters: Memory layer parameters are specified in the data node, which is a child of the layer node.

Mathematical Formulation

Memory saves data from the input blob.

Example

<layer ... type="Memory" ... >
<data id="r_27-28" index="0" size="2" />
<input> ... </input>
<output> ... </output>
</layer>

MVN Layer

Back to top

Name: MVN

Category: Normalization

Short description: Reference

Parameters: MVN layer parameters are specified in the data node, which is a child of the layer node.

Inputs

Mathematical Formulation

MVN subtracts mean value from the input blob:

\[ o_{i} = i_{i} - \frac{\sum{i_{k}}}{C * H * W} \]

If normalize_variance is set to 1, the output blob is divided by variance:

\[ o_{i}=\frac{o_{i}}{\sum \sqrt {o_{k}^2}+\epsilon} \]

Example

<layer ... type="MVN">
<data across_channels="1" eps="9.999999717180685e-10" normalize_variance="1"/>
<input>
...
</input>
<output>
...
</output>
</layer>

Norm Layer

Back to top

Name: Norm

Category: Normalization

Short description: Reference

Detailed description: Reference

Parameters: Norm layer parameters are specified in the data node, which is a child of the layer node.

Inputs

Mathematical Formulation

\[o_{i} = \left( 1 + \left( \frac{\alpha}{n} \right)\sum_{i}x_{i}^{2} \right)^{\beta}\]

Where $n$ is the size of each local region.

Example

<layer ... type="Norm" ... >
<data alpha="9.9999997e-05" beta="0.75" local-size="5" region="across"/>
<input> ... </input>
<output> ... </output>
</layer>

Normalize Layer

Back to top

Name: Normalize

Category: Normalization

Short description: Normalize layer performs l-p normalization of 1 of input blob.

Parameters: Normalize layer parameters should be specified as the data node, which is a child of the layer node.

Inputs

Mathematical Formulation

\[ o_{i} = \sum_{i}^{H*W}\frac{\left ( n*C*H*W \right )` scale}{\sqrt{\sum_{i=0}^{C*H*W}\left ( n*C*H*W \right )^{2}}} \]

Example

<layer ... type="Normalize" ... >
<data across_spatial="0" channel_shared="0" eps="0.000000"/>
<input> ... </input>
<output> ... </output>
</layer>

Pad Layer

Back to top

Name: Pad

Category: Layer

Short description: Pad layer extends an input blob on edges. New element values are generated based on the Pad layer parameters described below.

Parameters: Pad layer parameters are specified in the data section, which is a child of the layer node. The parameters specify a number of elements to add along each axis and a rule by which new element values are generated: for example, whether they are filled with a given constant or generated based on the input blob content.

Inputs

Outputs

pad_mode Examples

The following examples illustrate how output blob is generated for the Pad layer for a given input blob:

INPUT =
[[ 1 2 3 4 ]
[ 5 6 7 8 ]
[ 9 10 11 12 ]]

with the following parameters:

pads_begin = [0, 1]
pads_end = [2, 3]

depending on the pad_mode.

Example

<layer ... type="Pad" ...>
<data pads_begin="0,5,2,1" pads_end="1,0,3,7" pad_mode="constant" pad_value="666.0"/>
<input>
<port id="0">
<dim>1</dim>
<dim>3</dim>
<dim>32</dim>
<dim>40</dim>
</port>
</input>
<output>
<port id="2">
<dim>2</dim>
<dim>8</dim>
<dim>37</dim>
<dim>48</dim>
</port>
</output>
</layer>

Permute Layer

Back to top

Name: Permute

Category: Layer

Short description: Permute layer reorders input blob dimensions.

Detailed description: Reference

Parameters: Permute layer parameters are specified in the data node, which is a child of the layer node.

Inputs:

Mathematical Formulation

Permute layer reorders input blob dimensions. Source indexes and destination indexes are bound by the formula:

\[ src\_ind_{offset} = n * ordered[1] * ordered[2] * ordered[3] + (h * ordered[3] + w) \]

\[ n \in ( 0, order[0] ) \]

\[ h \in ( 0, order[2] ) \]

\[ w \in ( 0, order[3] ) \]

Example

<layer ... type="Permute" ... >
<data order="0,2,3,1"/>
<input> ... </input>
<output> ... </output>
</layer>

Pooling Layer

Back to top

Name: Pooling

Category: Pool

Short description: Reference

Detailed description: Reference

Parameters: Pooling layer parameters are specified in the data node, which is a child of the layer node.

Inputs:

Mathematical Formulation

Example

<layer ... type="Pooling" ... >
<data auto_pad="same_upper" exclude-pad="true" kernel="3,3" pads_begin="0,0" pads_end="1,1" pool-method="max" strides="2,2"/>
<input> ... </input>
<output> ... </output>
</layer>

Power Layer

Back to top

Name: Power

Category: Layer

Short description: Power layer computes the output as (shift + scale * x) ^ power for each input element x.

Parameters: Power layer parameters are specified in the data node, which is a child of the layer node.

Inputs:

Mathematical Formulation

\[ p = (shift + scale * x)^{power} \]

Example

<layer ... type="Power" ... >
<data power="2" scale="0.1" shift="5"/>
<input> ... </input>
<output> ... </output>
</layer>

PReLU Layer

Back to top

Name: PReLU

Category: Activation

Short description: PReLU is the Parametric Rectifier Linear Unit. The difference from ReLU is that negative slopes can vary across channels.

Parameters: PReLU layer parameters are specified in the data node, which is a child of the layer node.

Inputs:

Mathematical Formulation

PReLU accepts one input with four dimensions. The produced blob has the same dimensions as input. PReLU does the following with the input blob:

\[ o_{i} = max(0, x_{i}) + w_{i} * min(0,x_{i}) \]

where $w_{i}$ is from weights blob.

Example

<layer ... type="PReLU" ... >
<data channel_shared="1"/>
<input> ... </input>
<output> ... </output>
</layer>

PriorBox Layer

Back to top

Name: PriorBox

Category: Layer

Short description: PriorBox layer generates prior boxes of specified sizes and aspect ratios across all dimensions.

Parameters: PriorBox layer parameters are specified in the data node, which is a child of the layer node.

Inputs:

Mathematical Formulation:

PriorBox computes coordinates of prior boxes as follows:

  1. Calculates center_x and center_y of prior box:

    \[ W \equiv Width \quad Of \quad Image \]

    \[ H \equiv Height \quad Of \quad Image \]

    • If step equals 0:

      \[ center_x=(w+0.5) \]

      \[ center_y=(h+0.5) \]

    • else:

      \[ center_x=(w+offset)`step \]

      \[ center_y=(h+offset)`step \]

      \[ w \subset \left( 0, W \right ) \]

      \[ h \subset \left( 0, H \right ) \]

  2. For each $ s \subset \left( 0, min_sizes \right ) $, calculates coordinates of prior boxes:

    \[ xmin = \frac{\frac{center_x - s}{2}}{W} \]

    \[ ymin = \frac{\frac{center_y - s}{2}}{H} \]

    \[ xmax = \frac{\frac{center_x + s}{2}}{W} \]

    \[ ymax = \frac{\frac{center_y + s}{2}}{H} \]

Example

<layer ... type="PriorBox" ... >
<data step="64.000000" min_size="162.000000" max_size="213.000000" offset="0.500000" flip="1" clip="0" aspect_ratio="2.000000,3.000000" variance="0.100000,0.100000,0.200000,0.200000" />
<input> ... </input>
<output> ... </output>
</layer>

PriorBoxClustered Layer

Back to top

Name: PriorBoxClustered

Category: Layer

Short description: PriorBoxClustered layer generates prior boxes of specified sizes normalized to the input image size.

Parameters: PriorBoxClustered layer parameters are specified in the data node, which is a child of the layer node.

Inputs:

Mathematical Formulation

PriorBoxClustered computes coordinates of prior boxes as follows:

  1. Calculates the center_x and center_y of prior box:

    \[ W \equiv Width \quad Of \quad Image \]

    \[ H \equiv Height \quad Of \quad Image \]

    \[ center_x=(w+offset)`step \]

    \[ center_y=(h+offset)`step \]

    \[ w \subset \left( 0, W \right ) \]

    \[ h \subset \left( 0, H \right ) \]

  2. For each $s \subset \left( 0, W \right )$, calculates the prior boxes coordinates:

    \[ xmin = \frac{center_x - \frac{width_s}{2}}{W} \]

    \[ ymin = \frac{center_y - \frac{height_s}{2}}{H} \]

    \[ xmax = \frac{center_x - \frac{width_s}{2}}{W} \]

    \[ ymax = \frac{center_y - \frac{height_s}{2}}{H} \]

    If clip is defined, the coordinates of prior boxes are recalculated with the formula: $coordinate = \min(\max(coordinate,0), 1)$

Example

<layer ... type="PriorBoxClustered">
<data clip="0" flip="0" height="44.0,10.0,30.0,19.0,94.0,32.0,61.0,53.0,17.0" offset="0.5" step="16.0" variance="0.1,0.1,0.2,0.2"
width="86.0,13.0,57.0,39.0,68.0,34.0,142.0,50.0,23.0"/>
<input>
...
</input>
<output>
...
</output>
</layer>

Proposal Layer

Back to top

Name: Proposal

Category: Layer

Short description: Proposal layer filters bounding boxes and outputs only those with the highest prediction confidence.

Parameters: Proposal layer parameters are specified in the data node, which is a child of the layer node. The layer has three inputs: a blob with probabilities whether particular bounding box corresponds to background and foreground, a blob with logits for each of the bounding boxes, a blob with input image size in the [image_height, image_width, scale_height_and_width] or [image_height, image_width, scale_height, scale_width] format.

Mathematical Formulation

Proposal layer accepts three inputs with four dimensions. The produced blob has two dimensions: the first one equals batch_size * post_nms_topn. Proposal layer does the following with the input blob:

  1. Generates initial anchor boxes. Left top corner of all boxes is at (0, 0). Width and height of boxes are calculated from base_size with scale and ratio parameters.
  2. For each point in the first input blob:
    • pins anchor boxes to the image according to the second input blob that contains four deltas for each box: for x and y of center, for width and for height
    • finds out score in the first input blob
  3. Filters out boxes with size less than min_size
  4. Sorts all proposals (box, score) by score from highest to lowest
  5. Takes top pre_nms_topn proposals
  6. Calculates intersections for boxes and filter out all boxes with $intersection/union > nms\_thresh$
  7. Takes top post_nms_topn proposals
  8. Returns top proposals

Inputs:

Example

<layer ... type="Proposal" ... >
<data base_size="16" feat_stride="16" min_size="16" nms_thresh="0.6" post_nms_topn="200" pre_nms_topn="6000"
ratio="2.67" scale="4.0,6.0,9.0,16.0,24.0,32.0"/>
<input> ... </input>
<output> ... </output>
</layer>

PSROIPooling Layer

Back to top

Name: PSROIPooling

Category: Pool

Short description: PSROIPooling layer compute position-sensitive pooling on regions of interest specified by input.

Detailed description: Reference

Parameters: PSRoiPooling layer parameters are specified in the data node, which is a child of the layer node. PSROIPooling layer takes two input blobs: with feature maps and with regions of interests (box coordinates). The latter is specified as five element tuples: [batch_id, x_1, y_1, x_2, y_2]. ROIs coordinates are specified in absolute values for the average mode and in normalized values (to [0,1] interval) for bilinear interpolation.

Inputs:

Example

<layer ... type="PSROIPooling" ... >
<data group_size="6" mode="bilinear" output_dim="360" spatial_bins_x="3" spatial_bins_y="3" spatial_scale="1"/>
<input>
<port id="0">
<dim>1</dim>
<dim>3240</dim>
<dim>38</dim>
<dim>38</dim>
</port>
<port id="1">
<dim>100</dim>
<dim>5</dim>
</port>
</input>
<output>
<port id="2">
<dim>100</dim>
<dim>360</dim>
<dim>6</dim>
<dim>6</dim>
</port>
</output>
</layer>

Quantize Layer

Back to top

Name: Quantize

Category: Layer

Short description: Quantize layer is element-wise linear quantization of floating-point input values into a discrete set of floating-point values.

Detailed description: Input and output ranges as well as the number of levels of quantization are specified by dedicated inputs and attributes. There can be different limits for each element or groups of elements (channels) of the input blobs. Otherwise, one limit applies to all elements. It depends on shape of inputs that specify limits and regular broadcasting rules applied for input blobs. The output of the operator is a floating-point number of the same type as the input blob. In general, there are four values that specify quantization for each element: input_low, input_high, output_low, output_high. input_low and input_high parameters specify the input range of quantization. All input values that are outside this range are clipped to the range before actual quantization. output_low and output_high specify minimum and maximum quantized values at the output.

Parameters: Quantize layer parameters are specified in the data node, which is a child of the layer node.

Inputs:

Mathematical Formulation

Each element of the output is defined as the result of the following expression:

if x <= input_low:
output = output_low
elif x > input_high:
output = output_high
else:
# input_low < x <= input_high
output = round((x - input_low) / (input_high - input_low) * (levels-1)) / (levels-1) * (output_high - output_low) + output_low

Example

<layer type="Quantize">
<data levels="2"/>
<input>
<port id="0">
<dim>1</dim>
<dim>64</dim>
<dim>56</dim>
<dim>56</dim>
</port>
<port id="1">
<dim>1</dim>
<dim>64</dim>
<dim>1</dim>
<dim>1</dim>
</port>
<port id="2">
<dim>1</dim>
<dim>64</dim>
<dim>1</dim>
<dim>1</dim>
</port>
<port id="3">
<dim>1</dim>
<dim>1</dim>
<dim>1</dim>
<dim>1</dim>
</port>
<port id="4">
<dim>1</dim>
<dim>1</dim>
<dim>1</dim>
<dim>1</dim>
</port>
</input>
<output>
<port id="5">
<dim>1</dim>
<dim>64</dim>
<dim>56</dim>
<dim>56</dim>
</port>
</output>
</layer>

Range Layer

Back to top

Name: Range

Category: Layer

Short description: Range sequence of numbers according input values.

Detailed description: Range layers generates a sequence of numbers starting from the value in the first input up to but not including the value in the second input with a step equal to the value in the third input.

Parameters: Range layer does not have parameters.

Inputs:

Example

<layer ... type="Range">
<input>
<port id="0"/>
<port id="1"/>
<port id="2"/>
</input>
<output>
<port id="3">
<dim>10</dim>
</port>
</output>
</layer>

RegionYolo Layer

Back to top

Name: RegionYolo

Category: Layer

Short description: RegionYolo computes the coordinates of regions with probability for each class.

Detailed description: Reference

Parameters: RegionYolo layer parameters are specified in the data node, which is a child of the layer node.

Inputs:

Example

<layer ... type="RegionYolo" ... >
<data axis="1" classes="80" coords="4" do_softmax="0" end_axis="3" mask="0,1,2" num="9"/>
<input> ... </input>
<output> ... </output>
<weights .../>
</layer>

ReLU Layer

Back to top

Name: ReLU

Category: Activation

Short description: Reference

Detailed description: Reference

Parameters: ReLU layer parameters are specified parameters in the data node, which is a child of the layer node.

Mathematical Formulation

\[ Y_{i}^{( l )} = max(0, Y_{i}^{( l - 1 )}) \]

Inputs:

Example

<layer ... type="ReLU" ... >
<data negative_slope="0.100000"/>
<input> ... </input>
<output> ... </output>
</layer>

ReorgYolo Layer

Back to top

Name: ReorgYolo

Category: Layer

Short description: ReorgYolo reorganizes input blob taking into account strides.

Detailed description: Reference

Parameters: ReorgYolo layer parameters are specified parameters in the data node, which is a child of the layer node.

Inputs:

Example

<layer ... type="ReorgYolo" ... >
<data stride="1"/>
<input> ... </input>
<output> ... </output>
</layer>

Resample (Type 1) Layer

Back to top

Name: Resample

Category: Layer

Short description: Resample layer scales the input blob by the specified parameters.

Parameters: Resample layer parameters are specified in the data node, which is a child of the layer node. Resample Type 1 layer has one input blob containing image to resample.

Inputs:

Example

<layer type="Resample">
<data antialias="0" factor="2" type="caffe.ResampleParameter.LINEAR"/>
<input>
<port id="0">
<dim>1</dim>
<dim>3</dim>
<dim>25</dim>
<dim>30</dim>
</port>
</input>
<output>
<port id="1">
<dim>1</dim>
<dim>3</dim>
<dim>50</dim>
<dim>60</dim>
</port>
</output>
</layer>

Resample (Type 2) Layer

Back to top

Name: Resample

Category: Layer

Short description: Resample layer scales the input blob by the specified parameters.

Parameters: Resample layer parameters are specified in the data node, which is a child of the layer node. Resample Type 2 layer has two input blobs containing image to resample and output dimensions.

Inputs:

Example

<layer type="Resample">
<data antialias="0" factor="1" type="caffe.ResampleParameter.LINEAR"/>
<input>
<port id="0">
<dim>1</dim>
<dim>3</dim>
<dim>25</dim>
<dim>30</dim>
</port>
<port id="1">
<dim>4</dim>
</port>
</input>
<output>
<port id="2">
<dim>1</dim>
<dim>3</dim>
<dim>50</dim>
<dim>60</dim>
</port>
</output>
</layer>

Reshape Layer

Back to top

Name: Reshape

Category: Layer

Short description: Reshape layer changes dimensions of the input blob according to the specified order. Input blob volume is equal to output blob volume, where volume is the product of dimensions.

Detailed description: Reference

Parameters: Reshape layer does not have parameters. Reshape layer takes two input blobs: the blob to be resized and the output blob shape. The values in the second blob can be -1, 0 and any positive integer number. The two special values -1 and 0:

Inputs:

Example

<layer ... type="Reshape" ...>
<input>
<port id="0">
<dim>2</dim>
<dim>5</dim>
<dim>5</dim>
<dim>24</dim>
</port>
<port id="1">
<dim>3</dim>
</port>
</input>
<output>
<port id="2">
<dim>2</dim>
<dim>150</dim>
<dim>4</dim>
</port>
</output>
</layer>

ReverseSequence Layer

Back to top

Name: ReverseSequence

Category: Layer

Short description: ReverseSequence reverses variable length slices of data.

Detailed description: ReverseSequence slices input along the dimension specified in the batch_axis, and for each slice i, reverses the first lengths[i] (the second input) elements along the dimension specified in the seq_axis.

Parameters: ReverseSequence layer parameters are specified in the data node, which is a child of the layer node.

Inputs:

Example

<layer ... type="ReverseSequence">
<data batch_axis="0" seq_axis="1"/>
<input>
<port id="0">
<dim>3</dim>
<dim>10</dim>
<dim>100</dim>
<dim>200</dim>
</port>
<port id="1">
<dim>10</dim>
</port>
</input>
<output>
<port id="2">
<dim>3</dim>
<dim>10</dim>
<dim>100</dim>
<dim>200</dim>
</port>
</output>
</layer>

RNNCell Layer

Back to top

Name: RNNCell

Category: Layer

Short description: RNNCell layer computes the output using the formula described in the article.

Parameters: RNNCell layer parameters should be specified as the data node, which is a child of the layer node.

Inputs

Outputs


ROIPooling Layer

Back to top

Name: ROIPooling

Category: Pool

Short description: ROIPooling is a pooling layer used over feature maps of non-uniform input sizes and outputs a feature map of a fixed size.

Detailed description: deepsense.io reference

Parameters: ROIPooling layer parameters are specified in the data node, which is a child of the layer node.

Inputs:

Mathematical Formulation

\[ output_{j} = MAX\{ x_{0}, ... x_{i}\} \]

Example

<layer ... type="ROIPooling" ... >
<data pooled_h="6" pooled_w="6" spatial_scale="0.062500"/>
<input> ... </input>
<output> ... </output>
</layer>

ScaleShift Layer

Back to top

Name: ScaleShift

Category: Layer

Short description: ScaleShift layer performs linear transformation of the input blobs. Weights denote a scaling parameter, biases denote a shift.

Parameters: ScaleShift layer does not have parameters.

Inputs:

Mathematical Formulation

\[ o_{i} =\gamma b_{i} + \beta \]

Example

<layer ... type="ScaleShift" ... >
<input> ... </input>
<output> ... </output>
</layer>

Shape Layer

Back to top

Name: Shape

Category: Layer

Short description: Shape produces a blob with the input blob shape.

Parameters: Shape layer does not have parameters.

Inputs:

Example

<layer ... type="Shape">
<input>
<port id="0">
<dim>2</dim>
<dim>3</dim>
<dim>224</dim>
<dim>224</dim>
</port>
</input>
<output>
<port id="1">
<dim>4</dim>
</port>
</output>
</layer>

ShuffleChannels Layer

Back to top

Name: ShuffleChannels

Category: Layer

Short description: ShuffleChannels permutes data in the channel dimension of the input blob.

Parameters: ShuffleChannels layer parameters are specified in the data node, which is a child of the layer node.

Inputs:

Mathematical Formulation

The operation is the equivalent with the following transformation of the input blob x of shape [N, C, H, W]:

x' = reshape(x, [N, group, C / group, H * W])
x'' = transpose(x', [0, 2, 1, 3])
y = reshape(x'', [N, C, H, W])

where group is the layer parameter described above.

Example

<layer ... type="ShuffleChannels" ...>
<data group="3" axis="1"/>
<input>
<port id="0">
<dim>3</dim>
<dim>12</dim>
<dim>200</dim>
<dim>400</dim>
</port>
</input>
<output>
<port id="1">
<dim>3</dim>
<dim>12</dim>
<dim>200</dim>
<dim>400</dim>
</port>
</output>
</layer>

SimplerNMS Layer

Back to top

Name: SimplerNMS

Category: Layer

Short description: SimplerNMS layer filters bounding boxes and outputs only those with the highest confidence of prediction.

Parameters: SimplerNMS layer parameters are specified in the data node, which is a child of the layer node.

Inputs:

Mathematical Formulation

SimplerNMS accepts three inputs with four dimensions. Produced blob has two dimensions, the first one equals post_nms_topn. SimplerNMS does the following with the input blob:

  1. Generates initial anchor boxes. Left top corner of all boxes is (0, 0). Width and height of boxes are calculated based on scaled (according to the scale parameter) default widths and heights
  2. For each point in the first input blob:
    • pins anchor boxes to a picture according to the second input blob, which contains four deltas for each box: for x and y of the center, for width, and for height
    • finds out score in the first input blob
  3. Filters out boxes with size less than min_bbox_size.
  4. Sorts all proposals (box, score) by score from highest to lowest
  5. Takes top pre_nms_topn proposals
  6. Calculates intersections for boxes and filters out all with $intersection/union > iou\_threshold$
  7. Takes top post_nms_topn proposals
  8. Returns top proposals

Example

<layer ... type="SimplerNMS" ... >
<data iou_threshold="0.700000" min_bbox_size="16" feat_stride="16" pre_nms_topn="6000" post_nms_topn="150"/>
<input> ... </input>
<output> ... </output>
</layer>

Slice Layer

Back to top

Name: Slice

Category: Layer

Short description: Slice layer splits the input blob into several pieces over the specified axis.

Parameters: Slice layer parameters are specified in the data node, which is a child of the layer node.

Inputs:

Example

<layer ... type="Slice" ...>
<data axis="1"/>
<input>
<port id="0">
<dim>1</dim>
<dim>1048</dim>
<dim>14</dim>
<dim>14</dim>
</port>
</input>
<output>
<port id="1">
<dim>1</dim>
<dim>1024</dim>
<dim>14</dim>
<dim>14</dim>
</port>
<port id="2">
<dim>1</dim>
<dim>24</dim>
<dim>14</dim>
<dim>14</dim>
</port>
</output>
</layer>

SoftMax Layer

Back to top

Name: SoftMax

Category: Activation

Short description: Reference

Detailed description: Reference

Parameters: SoftMax layer parameters are specified in the data node, which is a child of the layer node.

Mathematical Formulation

\[ y_{c} = \frac{e^{Z_{c}}}{\sum_{d=1}^{C}e^{Z_{d}}} \]

where $C$ is a number of classes

Example

<layer ... type="SoftMax" ... >
<data axis="1" />
<input> ... </input>
<output> ... </output>
</layer>

Inputs:


Split Layer

Back to top

Name: Split

Category: Layer

Short description: Split layer splits the input along the specified axis into several output pieces.

Detailed description: Reference

Parameters: Split layer parameters are specified in the data node, which is a child of the layer node.

Mathematical Formulation

For example, if the blob is BxC+CxHxW, axis="1", and num_split="2", the sizes of output blobs are BxCxHxW.

Inputs:

Example

<layer ... type="Split" ... >
<data axis="0" num_split="2"/>
<input> ... </input>
<output> ... </output>
</layer>

StridedSlice Layer

Name: StridedSlice

Short description: StridedSlice layer extracts a strided slice of a blob. It is similar to generalized array indexing in Python*.

Parameters: StridedSlice layer parameters are specified in the data node, which is a child of the layer node.

Inputs:

Example

<layer ... type="StridedSlice" ...>
<data begin_mask="0,1,0,0,0" ellipsis_mask="0,0,0,0,0" end_mask="0,1,0,0,0" new_axis_mask="0,0,0,0,0" shrink_axis_mask="0,1,0,0,0"/>
<input>
<port id="0">
<dim>1</dim>
<dim>2</dim>
<dim>384</dim>
<dim>640</dim>
<dim>8</dim>
</port>
<port id="1">
<dim>5</dim>
</port>
<port id="2">
<dim>5</dim>
</port>
<port id="3">
<dim>5</dim>
</port>
</input>
<output>
<port id="4">
<dim>1</dim>
<dim>384</dim>
<dim>640</dim>
<dim>8</dim>
</port>
</output>
</layer>

TensorIterator Layer

Back to top

Name: TensorIterator

Category: Layer

Short description: TensorIterator (TI) layer performs recurrent sub-graph execution iterating through the data.

Parameters: The parameters are specified in the child nodes of the port_map and back_edges sections, which are child nodes of the layer node. The port_map and back_edges sections specify data mapping rules.

Example

<layer ... type="TensorIterator" ... >
<input> ... </input>
<output> ... </output>
<port_map>
<input external_port_id="0" internal_layer_id="0" internal_port_id="0" axis="1" start="-1" end="0" stride="-1"/>
<input external_port_id="1" internal_layer_id="1" internal_port_id="1"/>
...
<output external_port_id="3" internal_layer_id="2" internal_port_id="1" axis="1" start="-1" end="0" stride="-1"/>
...
</port_map>
<back_edges>
<edge from-layer="1" from-port="1" to-layer="1" to-port="1"/>
...
</back_edges>
<body>
<layers> ... </layers>
<edges> ... </edges>
</body>
</layer>

Tile Layer

Back to top

Name: Tile

Category: Layer

Short description: Tile layer extends input blob with copies of data along a specified axis.

Detailed description: Reference

Parameters: Tile layer parameters are specified in the data node, which is a child of the layer node.

Mathematical Formulation

Tile extends input blobs and filling in output blobs by the following rules:

\[ out_i=input_i[inner\_dim*t] \]

\[ t \in \left ( 0, \quad tiles \right ) \]

Inputs:

Example

<layer ... type="Tile" ... >
<data axis="3" tiles="88"/>
<input> ... </input>
<output> ... </output>
</layer>