Convolution Layer

Name: Convolution

Short description: Reference

Detailed description: Reference

Parameters: Convolution layer parameters should be specified in the convolution_data node, which is a child of the layer node.

Parameter name: stride
- Description: stride is a distance (in pixels) to slide the filter on the feature map over the (x, y) axis. For example, stride equal "1,1" means sliding the filter 1 pixel at a time over the (x, y) axis.
- Range of values: integer values starting from 0
Parameter name: stride-x
- Description: stride-x is a distance (in pixels) to slide the filter on the feature map over the x axis. For example, stride-x equal 1 means sliding the filter 1 pixel at a time over the x axis.
- Range of values: integer value
Parameter name: stride-y
- Description: stride-y is a distance (in pixels) to slide the filter on the feature map over the y axis. For example, stride-y equal 1 means sliding the filter 1 pixel at a time over the y axis.
- Range of values: integer value
Parameter name: pad
- Description: pad is a number of pixels to add to the left and top of the input. For example, pad equal 1 means adding 1 pixel to the left of the input. Right and bottom padding should be calculated from the expected output width (height).
- Range of values: integer values starting from 0
Parameter name: pad-x
- Description: pad-x is a number of pixels to add to the left of the input. For example, pad-x equal 1 means adding 1 pixel to the left of the input. Right and bottom padding should be calculated from the expected output width (height).
- Range of values: integer values starting from 0
Parameter name: pad-y
- Description: pad-y is a number of pixels to add to the top of the input. For example, pad-y equal 1 means adding 1 pixel to the top of the input. Right and bottom padding should be calculated from the expected output width (height).
- Range of values: integer values starting from 0
Parameter name: kernel
- Description: kernel is a width and height of each filter. For example, kernel equal 3 (3, 3) means that each filter has width and height equal to 3.
- Range of values: integer values starting from 0
Parameter name: kernel-x
- Description: kernel-x is a width of each filter. For example, kernel equal 3 means that each filter has width equal to 3.
- Range of values: integer value starting from 0
Parameter name: kernel-y
- Description: kernel-y is a height of each filter. For example, kernel-y equal 3 means that each filter has height equal to 3.
- Range of values: integer value starting from 0
Parameter name: output
- Description: output is a number of output feature maps per whole output (when group > 1, output still matches the number of output features regardless of group value). For example, output equals 1 means that there is 1 output feature map in a layer.
- Range of values: integer values starting from 0
Parameter name: group
- Description: group denotes the number of groups to which output and input should be split. For example, group equal 1 means that all the filters are applied to full input (usual convolution), group equals 2 means that both input and output channels are separated into 2 groups and i-th output group is connected to i-th input group channels. group equals number of output feature maps denotes depth-wise separable convolution (Reference).
- Range of values: integer values starting from 0
Parameter name: dilation
- Description: dilation denotes the distance in width and height between elements (weights) in the filter. For example, dilation equal "1,1" means that all the elements in the filter are neighbors, so it is the same as for the usual convolution. dilation equal "2,2" means that all the elements in the filter are matched not to adjacent elements in the input matrix, but to those that are adjacent with distance 1.
- Range of values: integer value starting from 0
Parameter name: dilation-x
- Description: dilation-x denotes the distance in width between elements (weights) in the filter. For example, dilation-x equal 1 means that all the elements in the filter are neighbors, so it is the same as for the usual convolution. dilation-x equal 2 means that all the elements in the filter are matched not to adjacent elements in the input matrix, but to those that are adjacent with distance 1.
Parameter name: dilation-y
- Description: dilation-y denotes the distance in height between elements (weights) in the filter. For example, dilation-y equal 1 means that all the elements in the filter are neighbors, so it is the same as for the usual convolution. dilation-y equal 2 means that all the elements in the filter are matched not to adjacent elements in the input matrix, but to those that are adjacent with distance 1.
- Range of values: integer value starting from 0

Weights Layout Weights layout is GOIYX, which means that X is changing the fastest, then Y, then Input, Output, then Group.

Mathematical Formulation

For the convolutional layer, the number of output features in each dimension is calculated using the formula:
$n_{out} = \left ( \frac{n_{in} + 2p - k}{s} \right ) + 1$
The receptive field in each layer is calculated using the formulas:
- Jump in the output feature map:
  $j_{out} = j_{in} * s$
- Size of the receptive field of output feature:
  $r_{out} = r_{in} + ( k - 1 ) * j_{in}$
- Center position of the receptive field of the first output feature:
  $start_{out} = start_{in} + ( \frac{k - 1}{2} - p ) * j_{in}$
- Output is calculated using the following formula:
  $out = \sum_{i = 0}^{n}w_{i}x_{i} + b$

Example

<layer ... type="Convolution" ... >
        <convolution_data stride-x="4" stride-y="4" pad-x="0" pad-y="0" kernel-x="11" kernel-y="11" output="96" group="1" dilation-x="2" dilation-y="2"/>
        <input> ... </input>
        <output> ... </output>
        <weights ... />
        <biases ... />
    </layer>

Gather Layer

Name: Gather

Short description: Gather layer takes slices of data in the second input blob according to the indices specified in the first input blob. The output blob shape is input2.shape[:axis] + input1.shape + input2.shape[axis + 1:].

Parameters: Gather layer parameters should be specified in the data section, which is placed as a child of the layer node.

Parameter name: axis
- Description: axis is a index of a dimension to gather data. For example, axis equal to 1 means that gathering is performed over the first dimension.
- Range of values: a single integer in the range [-len(input2.shape), len(input2.shape) - 1].

Inputs

1: Multidimensional input blob with indices to gather. The values for indices are in the range [0, input1[axis] - 1].
2: Multidimensional input blob with arbitrary data.

Mathematical Formulation

$output[:, ... ,:, i, ... , j,:, ... ,:] = input2[:, ... ,:, input1[i, ... ,j],:, ... ,:]$

Example

<layer id="1" name="gather_node" precision="FP32" type="Gather">
    <data axis=1 />
    <input>
        <port id="0">
            <dim>15</dim>
            <dim>4</dim>
            <dim>20</dim>
            <dim>28</dim>
        </port>
        <port id="1">
            <dim>6</dim>
            <dim>12</dim>
            <dim>10</dim>
            <dim>24</dim>
        </port>
    </input>
    <output>
        <port id="2">
            <dim>6</dim>
            <dim>15</dim>
            <dim>4</dim>
            <dim>20</dim>
            <dim>28</dim>
            <dim>10</dim>
            <dim>24</dim>
        </port>
    </output>
</layer>

Pooling Layer

Name: Pooling

Short description: Reference

Detailed description: Reference

Parameters: Specify pooling layer parameters in the pooling_data node, which is a child of the layer node.

Parameter name: stride
- Description: stride is a distance (in pixels) to slide the filter on the feature map over the (x, y) axis. For example, stride equal "1,1" means sliding the filter 1 pixel at a time over the (x, y) axis.
- Range of values: integer values starting from 0
Parameter name: stride-x
- Description: stride-x is a distance (in pixels) to slide the filter on the feature map over the x axis. For example, stride-x equal 1 means sliding the filter 1 pixel at a time over the x axis.
- Range of values: integer value
Parameter name: stride-y
- Description: stride-y is a distance (in pixels) to slide the filter on the feature map over the y axis. For example, stride-y equal 1 means sliding the filter 1 pixel at a time over the y axis.
- Range of values: integer value
Parameter name: pad
- Description: pad is a number of pixels to add to the left and top of the input. For example, pad equal 1 means adding 1 pixel to the left of the input. Right and bottom padding should be calculated from the expected output width (height).
- Range of values: integer values starting from 0
Parameter name: pad-x
- Description: pad-x is a number of pixels to add to the left of the input. For example, pad-x equal 1 means adding 1 pixel to the left of the input. Right and bottom padding should be calculated from the expected output width (height).
- Range of values: integer values starting from 0
Parameter name: pad-y
- Description: pad-y is a number of pixels to add to the top of the input. For example, pad-y equal 1 means adding 1 pixel to the top of the input. Right and bottom padding should be calculated from the expected output width (height).
- Range of values: integer values starting from 0
Parameter name: kernel
- Description: kernel is a width and height of each filter. For example, kernel equal 3 (3, 3) means that each filter has width and height equal to 3.
- Range of values: integer values starting from 0
Parameter name: kernel-x
- Description: kernel-x is a width of each filter. For example, kernel equal 3 means that each filter has width equal to 3.
- Range of values: integer value starting from 0
Parameter name: kernel-y
- Description: kernel-y is a height of each filter. For example, kernel-y equal 3 means that each filter has height equal to 3.
- Range of values: integer value starting from 0
Parameter name: pool-method
- Description: pool-method is a type of pooling strategy for values.
- Range of values:
  - max - chooses the biggest value in a feature map for each filter position
  - avg - takes the average value in a feature map for each filter position
Parameter name: exclude-pad
- Description: exclude-pad is a type of pooling strategy for values in the padding area. For example, if exclude-pad is "true", zero-values in the padding are not used.
- Range of values: "true" or "false"
Parameter name: rounding_type
- Description: rounding_type is a type of rounding to be applied.
- Range of values:
  - ceil
  - floor

Mathematical Formulation

For max pool-method:
$output_{j} = MAX\{ x_{0}, ... x_{i}\}$
For avg pool-method:
$output_{j} = \frac{\sum_{i = 0}^{n}x_{i}}{n}$

Example

<layer ... type="Pooling" ... >
        <pooling_data kernel-x="3" kernel-y="3" pad-x="0" pad-y="0" stride-x="2" stride-y="2" pool-method="max" exclude-pad="true" rounding_type="floor"/>
        <input> ... </input>
        <output> ... </output>
    </layer>

ROIPooling Layer

Name: ROIPooling

Short description: It is a pooling layer with max pooling strategy (see max option in the *Pooling layer* parameters description). It is used over feature maps of non-uniform sizes and outputs another feature map of a fixed size.

Detailed description: deepsense.io reference

Parameters: Specify ROIPooling layer parameters in the data node, which is a child of the layer node.

Parameter name: pooled_h
- Description: pooled_h is a height of the ROI output feature map. For example, pooled_h equal 6 means that the height of the output of ROIpooling is 6.
- Range of values: integer values starting from 0
Parameter name: pooled_w
- Description: pooled_w is a width of the ROI output feature map. For example, pooled_w equal 6 means that the width of the output of ROIpooling is 6.
- Range of values: integer values starting from 0
Parameter name: spatial_scale
- Description: spatial_scale is a ratio of the input feature map over the input image size.
- Range of values: positive floating point value

Mathematical Formulation

$output_{j} = MAX\{ x_{0}, ... x_{i}\}$

Example

<layer ... type="ROIPooling" ... >
        <data pooled_h="6" pooled_w="6" spatial_scale="0.062500"/>
        <input> ... </input>
        <output> ... </output>
    </layer>

FullyConnected Layer

Name: FullyConnected

Short description: Reference

Detailed description: Reference

Parameters: Specify FullyConnected layer parameters in the fc_data node, which is a child of the layer node.

Parameter name: out-size
- Description: out-size is a length of the output vector. For example, out-size equal 4096 means that the output vector length is 4096.
- Range of values: integer values starting from 0

Weights Layout OI, which means that Input is changing the fastest, then Output.

Mathematical Formulation

If previous layer is FullyConnected:
$y_{i} = f( z_{i} ) \quad with \quad z_{i} = \sum_{j=1}^{m_{1}^{( l-1 )}}w_{i,j}^{( l )}y_{i}^{ ( l -1 )}$
Otherwise:
$y_{i} = f( z_{i} ) \quad with \quad z_{i}^{ ( l )} = \sum_{j=1}^{m_{1}^{( l-1 )}}\sum_{r=1}^{m_{2}^{ ( l-1 )}}\sum_{s=1}^{m_{3}^{ ( l-1 )}}w_{i,j,r,s}^{ ( l )} ( Y_{i}^{ (l-1) })_{r,s}$

Example

<layer ... type="FullyConnected" ... >
        <fc_data out-size="4096"/>
        <input> ... </input>
        <output> ... </output>
    </layer>

ReLU Layer

Name: ReLU

Short description: Reference

Detailed description: Reference

Parameters: ReLU layer parameters can be (not mandatory) specified in the data node, which is a child of the layer node.

Parameter name: negative_slope
- Description: negative_slope is a multiplier, which is used if the unit is not active (that is negative). For example, negative_slope equal 0.1 means that an inactive unit value would be multiplied by 0.1 and this is the Leaky ReLU. If negative_slope is equal to 0, this is the usual ReLU.
- Range of values: double values starting from 0

Mathematical Formulation

$Y_{i}^{( l )} = max(0, Y_{i}^{( l - 1 )})$

Example

<layer ... type="ReLU" ... >
    <data negative_slope="0.100000"/>
    <input> ... </input>
    <output> ... </output>
</layer>

Activation Layer

Name: Activation

Short description: Activation layer represents an activation function of each neuron in a layer, which is used to add non-linearity to the computational flow.

Detailed description: Reference

Parameters: Activation layer parameters should be specified in the data node, which is a child of the layer node.

Parameter name: type
- Description: type represents particular activation function. For example, type equal sigmoid means that neurons of this layer have a sigmoid activation function.
- Range of values:
  - sigmoid - sigmoid activation function. Learn more from the Detailed description section.
  - tanh - tanh activation function. Learn more from the Detailed description section.
  - elu - elu activation function. Learn more from the Detailed description section.
  - relu6 - relu6 activation function.

Mathematical Formulation

Sigmoid function:
$f( x ) = \frac{1}{1+e^{-x}}$
Tahn function:
$f ( x ) = \frac{2}{1+e^{-2x}} - 1 = 2sigmoid(2x) - 1$
Elu function:
$f(x) = \left\{\begin{array}{ll} e^{x} - 1 \quad \mbox{if } x < 0 \\ x \quad \mbox{if } x \geq 0 \end{array}\right.$
Relu6 function:
$f(x) = min(max(0, x), 6)$

Example

<layer ... type="Activation" ... >
    <data type="sigmoid" />
    <input> ... </input>
    <output> ... </output>
</layer>

SoftMax layer

Name: SoftMax

Short description: Reference

Detailed description: Reference

Parameters: SoftMax layer parameters can be (not mandatory) specified in the data node, which is a child of the layer node.

Parameter name: axis
- Description: axis represents the axis of which the SoftMax is calculated. axis equal 1 is a default value.
- Range of values: positive integer values

Mathematical Formulation

$y_{c} = \frac{e^{Z_{c}}}{\sum_{d=1}^{C}e^{Z_{d}}}$

where $C$ is a number of classes

Example

<layer ... type="SoftMax" ... >
    <data axis="1" />
    <input> ... </input>
    <output> ... </output>
</layer>

Deconvolution Layer

Name: Deconvolution

Short description: Deconvolution layer is applied for upsampling the output to the higher image resolution.

Detailed description: Reference

Parameters: Deconvolution layer parameters should be specified in the deconvolution_data node, which is a child of the layer node.

Parameters: Convolution layer parameters should be specified in the convolution_data node, which is a child of the layer node.

Parameter name: stride
- Description: stride is a distance (in pixels) to slide the filter on the feature map over the (x, y) axis. For example, stride equal "1,1" means sliding the filter 1 pixel at a time over the (x, y) axis.
- Range of values: integer values starting from 0
Parameter name: stride-x
- Description: stride-x is a distance (in pixels) to slide the filter on the feature map over the x axis. For example, stride-x equal 1 means sliding the filter 1 pixel at a time over the x axis.
- Range of values: integer value
Parameter name: stride-y
- Description: stride-y is a distance (in pixels) to slide the filter on the feature map over the y axis. For example, stride-y equal 1 means sliding the filter 1 pixel at a time over the y axis.
- Range of values: integer value
Parameter name: pad
- Description: pad is a number of pixels to add to the left and top of the input. For example, pad equal 1 means adding 1 pixel to the left of the input. Right and bottom padding should be calculated from the expected output width (height).
- Range of values: integer values starting from 0
Parameter name: pad-x
- Description: pad-x is a number of pixels to add to the left of the input. For example, pad-x equal 1 means adding 1 pixel to the left of the input. Right and bottom padding should be calculated from the expected output width (height).
- Range of values: integer values starting from 0
Parameter name: pad-y
- Description: pad-y is a number of pixels to add to the top of the input. For example, pad-y equal 1 means adding 1 pixel to the top of the input. Right and bottom padding should be calculated from the expected output width (height).
- Range of values: integer values starting from 0
Parameter name: kernel
- Description: kernel is a width and height of each filter. For example, kernel equal 3 (3, 3) means that each filter has width and height equal to 3.
- Range of values: integer values starting from 0
Parameter name: kernel-x
- Description: kernel-x is a width of each filter. For example, kernel equal 3 means that each filter has width equal to 3.
- Range of values: integer value starting from 0
Parameter name: kernel-y
- Description: kernel-y is a height of each filter. For example, kernel-y equal 3 means that each filter has height equal to 3.
- Range of values: integer value starting from 0
Parameter name: output
- Description: output is a number of output feature maps per whole output (when group > 1, output still matches the number of output features regardless of group value). For example, output equals 1 means that there is 1 output feature map in a layer.
- Range of values: integer values starting from 0
Parameter name: group
- Description: group denotes the number of groups to which output and input should be split. For example, group equal 1 means that all the filters are applied to full input (usual convolution), group equals 2 means that both input and output channels are separated into 2 groups and i-th output group is connected to i-th input group channels. group equals number of output feature maps denotes depth-wise separable convolution (Reference).
- Range of values: integer values starting from 0
Parameter name: dilation
- Description: dilation denotes the distance in width and height between elements (weights) in the filter. For example, dilation equal "1,1" means that all the elements in the filter are neighbors, so it is the same as for the usual convolution. dilation equal "2,2" means that all the elements in the filter are matched not to adjacent elements in the input matrix, but to those that are adjacent with distance 1.
- Range of values: integer value starting from 0
Parameter name: dilation-x
- Description: dilation-x denotes the distance in width between elements (weights) in the filter. For example, dilation-x equal 1 means that all the elements in the filter are neighbors, so it is the same as for the usual convolution. dilation-x equal 2 means that all the elements in the filter are matched not to adjacent elements in the input matrix, but to those that are adjacent with distance 1.
Parameter name: dilation-y
- Description: dilation-y denotes the distance in height between elements (weights) in the filter. For example, dilation-y equal 1 means that all the elements in the filter are neighbors, so it is the same as for the usual convolution. dilation-y equal 2 means that all the elements in the filter are matched not to adjacent elements in the input matrix, but to those that are adjacent with distance 1.
- Range of values: integer value starting from 0

Weights Layout Weights layout is the following: GOIYX, which means that X is changing the fastest, then Y, then Input, Output, then Group.

Mathematical Formulation

Deconvolution is also called transpose convolution and performs operation, reverse to convolution.

The number of output features for each dimensions is calculated:

$S_{o}=stride(S_{i} - 1 ) + S_{f} - 2pad$

Where $S$ is size of output, input and filter.

Output is calculated in the same way as for convolution layer:

$out = \sum_{i = 0}^{n}w_{i}x_{i} + b$

Example

<layer ... type="Deconvolution" ... >
    <deconvolution_data stride-x="2" stride-y="2" pad-x="1" pad-y="1" kernel-x="4" kernel-y="4" output="19" group="1"/>
    <input> ... </input>
    <output> ... </output>
</layer>

Local Response Normalization (LRN) layer

Name: Norm

Short description: Reference

Detailed description: Reference

Parameters: Norm layer parameters should be specified in the norm_data node, which is a child of the layer node.

Parameter name: alpha
- Description: alpha represents the scaling parameter for the normalizing sum. For example, alpha equal 0.0001 means that the normalizing sum is multiplied by 0.0001.
- Range of values: floating point positive number
Parameter name: beta
- Description: beta represents the exponent for the normalizing sum. For example, beta equal 0.75 means that the normalizing sum is raised to the power of 0.75.
- Range of values: floating point positive number
Parameter name: region
- Description: region represents strategy of local regions extension. For example, region equal across means that the normalizing sum is performed over adjacent channels.
- Range of values:
  - across - normalizing sum is performed over adjacent channels
  - same - normalizing sum is performed over nearby spatial locations
Parameter name: local-size
- Description: local-size represents the side length of the region to be used for the normalization sum or number of channels depending on the strategy specified in the region parameter. For example, local-size equal 5 for the across strategy means application of sum across 5 adjacent channels.
- Range of values: positive integer bigger than zero

Mathematical Formulation

$o_{i} = \left( 1 + \left( \frac{\alpha}{n} \right)\sum_{i}x_{i}^{2} \right)^{\beta}$

Where $n$ is the size of each local region.

Example

<layer ... type="Norm" ... >
    <norm_data alpha="9.9999997e-05" beta="0.75" local-size="5" region="across"/>
    <input> ... </input>
    <output> ... </output>
</layer>

Concat Layer

Name: Concat

Short description: Reference

Parameters: Concat layer parameters should be specified in the concat_data node, which is a child of the layer node.

Parameter name: axis
- Description: axis is the number of axis over which input blobs are concatenated. For example, axis equal 1 means that input blobs are concatenated over the first axis.
- Range of values: positive number greater or equal to 0

Mathematical Formulation

Axis parameter specifies a blob dimension to concat values. For example, for two input blobs B1xC1xH1xW1 and B2xC2xh4xW2 if axis: 1, output blob is****: B1xC1+C2xH1xW1. This is only possible if B1=B2, H1=H4, W1=W2.

Example

<layer ... type="Concat" ... >
    <concat_data axis="1"/>
    <input> ... </input>
    <output> ... </output>
</layer>

Split Layer

Name: Split

Short description: Split layer splits the input into several output groups. Group sizes are denoted by the number and the size of output ports.

Detailed description: Reference

Parameters: None

Mathematical Formulation

Splits input blob among children. For example, blob is BxC+CxHxW and there are two children. Then, output blob is BxCxHxW.

Example

<layer ... type="Split" ... >
    <input> ... </input>
    <output> ... </output>
</layer>

Reshape Layer

Name: Reshape

Short description: Reshape layer changes dimensions of the input blob according to the specified order. Input blob volume is equal to output blob volume, where volume is the product of dimensions.

Detailed description: Reference

Parameters: Reshape layer parameters should be specified in the data node, which is a child of the layer node.

Parameter name: axis
- Description: axis is the number of the starting axis for reshape. For example, axis equal 1 means that Reshape replaces dimensions starting from the next after the first dimension.
- Range of values: positive number greater or equal to 0
Parameter name: dim
- Description: dim is a set of numbers separated with comma, which denote the dimensions of output blob. For example, dim equal 88,1,71 means that output blob gets following dimensions: first dimension equals 88, second dimension equals 1, third dimension equals 71. For more information, refer to the Description block. If dim is equal to two numbers, it performs flattening.
- Range of values: set of positive integer numbers separated with comma
Parameter name: num_axes
- Description: num_axes is the number of dimensions to be replaced with a reshaped blob starting from the dimension number specified in axis property. For example, num_axes equal 2 means that 2 dimensions are replaced with reshaped blob.
- Range of values:
  - -1 - all dimensions are taken starting from the dimension number specified in axis property
  - positive number greater than the value in the axis parameter

Mathematical Formulation

If you want to reshape input blob BxCxHxW into Bx1x(C*H)xW, the dim parameters of your layer should be:

layer {
   name: "reshape"
   type: "Reshape"
   bottom: "input"
   top: "output"
   reshape_param {
     shape {
       dim: 0  # copy the dimension from below
       dim: 1
       dim: -1 # infer it from the other dimensions
       dim: 0
     }
   }
 }

Example

<layer ... type="Reshape" ... >
    <data axis="0" dim="1, 1001" num_axes="-1"/>
    <input> ... </input>
    <output> ... </output>
</layer>

Eltwise Layer

Name: Eltwise

Short description: Eltwise layer performs element-wise operation, which is specified in parameters, over given inputs.

Parameters: Eltwise layer parameters should be specified in the elementwise_data node, which is placed as a child of the layer node.

Parameter name: operation
- Description: operation is the simple mathematical operation to be performed over inputs. For example, operation equal mul means that input blobs are multiplied.
- Range of values:
  - sum - summation of given values
  - max - select maximum from given values
  - mul - multiplication of given values

Mathematical Formulation Eltwise accepts 2 inputs of any number of dimensions - from 1 to 4, however, it is required for both of them to have absolutely same dimensions. The produced blob is also of the same dimension as each of its parents

Eltwise does the following with the input blobs:

$o_{i} = f(b_{i}^{1}, b_{i}^{2})$

where $b_{i}^{1}$ - first blob $i$ -th element, $b_{i}^{2}$ - second blob $i$ -th element, $o_{i}$ - output blob $i$ -th element, $f(a, b)$ - is a function that performs an operation over its two arguments $a, b$ .

For sum operation, is defined as
$f(a,b) = a + b$
For mul operation, is defined as
$f(a,b) = a * b$
For max operation, is defined as
$f(a,b) = \left\{\begin{array}{ll} a \quad \mbox{if } a \geq b \\ b \quad \mbox{if } b > a \end{array}\right.$

Example

<layer ... type="Eltwise" ... >
    <elementwise_data operation="sum"/>
    <input> ... </input>
    <output> ... </output>
</layer>

ScaleShift Layer

Name: ScaleShift

Short description: ScaleShift layer performs linear transformation of the input blobs. Weights denote scaling parameter, biases - a shift.

Parameters: ScaleShift layer does not have additional parameters.

Mathematical Formulation

$o_{i} =\gamma b_{i} + \beta$

Example

<layer ... type="ScaleShift" ... >
    <input> ... </input>
    <output> ... </output>
</layer>

Crop (Type 1) Layer

Name: Crop

Short description: Crop layer changes selected dimensions of the input blob according to the specified parameters.

Parameters: Crop layer parameters should be specified in data section, which is placed as a child of the layer node. Due to various representation of Crop attributes in existing frameworks, this layer can be described in three independent ways: Crop Type 1 layer takes two input blobs, and the shape of the second blob specifies the Crop size. The layer has two attributes: axis and offset. Crop layer takes two input blobs, and the shape of the second blob specifies the Crop size. The Crop layer of this type supports shape inference.

Parameter name: axis
- Description: axis is a number of a dimension to be used for cropping. For example, axis equal to 1 means that cropping is performed over the first dimension.
- Range of values: a list of unique integers, where each element is greater than or equal to 0 and less than input shape length.
Parameter name: offset
- Description: offset denotes the starting point for crop in the input blob. For example, offset equal to 2 means that crop is starting from the second value in the given axis.
- Range of values: a list of integers of the length equal to the length of axis attribute. In the list, offset[i] is greater than or equal to 0 and less than or equal to input_shape[axis[i]] - crop_size[axis[i]], where crop_size is the shape of the second input.

Inputs

1: Multidimensional input blob *(for example, NCHW, NCH, or NC)*
2: Shape of this input will be used for crop

Example

<layer id="39" name="score_pool4c" precision="FP32" type="Crop">
    <data axis="2,3" offset="0,0"/>
    <input>
        <port id="0">
            <dim>1</dim>
            <dim>21</dim>
            <dim>44</dim>
            <dim>44</dim>
        </port>
        <port id="1">
            <dim>1</dim>
            <dim>21</dim>
            <dim>34</dim>
            <dim>34</dim>
        </port>
    </input>
    <output>
        <port id="2">
            <dim>1</dim>
            <dim>21</dim>
            <dim>34</dim>
            <dim>34</dim>
        </port>
    </output>
</layer>

Crop (Type 2) Layer

Name: Crop

Short description: Crop layer changes selected dimensions of the input blob according to the specified parameters.

Parameters: Crop layer parameters should be specified in data section, which is placed as a child of the layer node. Due to various representation of Crop attributes in existing frameworks, this layer can be described in three independent ways: Crop Type 2 layer takes one input blob to Crop and has three attributes: axis, offset, and dim. Crop layer takes one input blob to Crop and has axis, offset, and dim attributes. The Crop layer of this type supports shape inference only when shape propagation is applied to dimensions that are not specified in the axis attribute.

Parameter name: axis
- Description: axis is a number of a dimension to be used for cropping. For example, axis equal to 1 means that cropping is performed over the first dimension.
- Range of values: a list of unique integers, where each element is greater than or equal to 0 and less than input shape length
Parameter name: offset
- Description: offset denotes the starting point for crop in the input blob. For example, offset equal to 2 means that cropping starts from the second value in the given axis.
- Range of values: a list of integers with the length equal to length of axis attribute, where offset[i] is greater than or equal to 0 and less or equal to input_shape[axis[i]] - dim[i]
Parameter name: dim
- Description: dim is the resulting size of the output blob for the given axis. For example, dim equal to 88 means that the output blob gets the dimension equal to 88 for the given axis.
- Range of values: a list of integers

Example

<layer id="39" name="score_pool4c" precision="FP32" type="Crop">
    <data axis="2,3" offset="0,0" dim="34,34"/>
    <input>
        <port id="0">
            <dim>1</dim>
            <dim>21</dim>
            <dim>44</dim>
            <dim>44</dim>
        </port>
    </input>
    <output>
        <port id="1">
            <dim>1</dim>
            <dim>21</dim>
            <dim>34</dim>
            <dim>34</dim>
        </port>
    </output>
</layer>

Crop (Type 3) Layer

Name: Crop

Short description: Crop layer changes selected dimensions of the input blob according to the specified parameters.

Parameters: Crop layer parameters should be specified in data section, which is placed as a child of the layer node. Due to various representation of Crop attributes in existing frameworks, this layer can be described in three independent ways: Crop Type 3 layer takes one input blob to Crop and has three attributes: axis, crop_begin, and crop_end. Crop layer takes one input blob to Crop and has axis, crop_begin, and crop_end attributes. The Crop layer of this type supports shape inference.

Parameter name: axis
- Description: axis is the number of the dimension to be used for cropping. For example, axis equal 1 means that cropping is performed over the first dimension.
- Range of values: a list of unique integers, where each element is greater than or equal to 0 and less than input shape length
Parameter name: crop_begin
- Description: crop_begin specifies the starting offset for crop in the input blob for given axes.
- Range of values: a list of integers, where crop_begin[i] is greater than or equal to 0 and less than input_shape[axis[i]] - crop_end[i]
Parameter name: crop_end
- Description: crop_end specifies the ending offset for crop in the input blob for given axes.
- Range of values: a list of integers, where crop_end[i] is greater than or equal to 0 and less than input_shape[axis[i]] - crop_begin[i]

Example

<layer id="39" name="score_pool4c" precision="FP32" type="Crop">
    <data axis="2,3" crop_begin="4,4" crop_end="6,6"/>
    <input>
        <port id="0">
            <dim>1</dim>
            <dim>21</dim>
            <dim>44</dim>
            <dim>44</dim>
        </port>
    </input>
    <output>
        <port id="1">
            <dim>1</dim>
            <dim>21</dim>
            <dim>34</dim>
            <dim>34</dim>
        </port>
    </output>
</layer>

Batch Normalization Layer

Name: BatchNormalization

Short description: Reference

Detailed description: Reference

Parameters: BatchNormalization layer parameters should be specified as the batch_norm_data node, which is a child of the layer node.

Parameter name: epsilon
- Description: epsilon is the number to be added to the variance to avoid division by zero when normalizing the value. For example, epsilon equal 0.001 means that 0.001 is added to the variance.
- Range of values: positive floating point number

Mathematical Formulation

BatchNormalization is the normalization of the output in each hidden layer.

Input: Values of over a mini-batch:
$\beta = \{ x_{1...m} \}$
Parameters to learn: $\gamma, \beta$
Output:
$\{ o_{i} = BN_{\gamma, \beta} ( b_{i} ) \}$
Mini-batch mean:
$\mu_{\beta} \leftarrow \frac{1}{m}\sum_{i=1}^{m}b_{i}$
Mini-batch variance:
$\sigma_{\beta }^{2}\leftarrow \frac{1}{m}\sum_{i=1}^{m} ( b_{i} - \mu_{\beta} )^{2}$
Normalize:
$\hat{b_{i}} \leftarrow \frac{b_{i} - \mu_{\beta}}{\sqrt{\sigma_{\beta }^{2} + \epsilon }}$
Scale and shift:
$o_{i} \leftarrow \gamma\hat{b_{i}} + \beta = BN_{\gamma ,\beta } ( b_{i} )$

Example

<layer ... type="BatchNormalization" ... >
    <batch_norm_data epsilon="9.99e-06" />
    <input> ... </input>
    <output> ... </output>
</layer>

Normalize Layer

Name: Normalize

Short description: Normalize layer performs l-p normalization of 1 of input blob.

Parameters: Normalize layer parameters should be specified as the data node, which is a child of the layer node.

Parameter name: across_spatial
- Description: across_spatial is a flag that denotes if normalization is performed over CHW or HW. For example, across_spatial equals 0 means that normalization is not shared across channels.
- Range of values:
  - 0
  - 1 - not supported
Parameter name: channel_shared
- Description: channel_shared is a flag that denotes if scale parameters are shared across channels. For example, channel_shared equal 0 means that scale parameters are not shared across channels.
- Range of values:
  - 0 - scale parameters are not shared across channels
  - 1 - not supported
Parameter name: eps
- Description: eps is the epsilon used to avoid division by zero when normalizing the value. For example, eps equals 0.001 means that 0.001 is used if all the values in normalization are equal to zero.
- Range of values: positive floating point number

Mathematical Formulation

$o_{i} = \sum_{i}^{H*W}\frac{\left ( n*C*H*W \right )* scale}{\sqrt{\sum_{i=0}^{C*H*W}\left ( n*C*H*W \right )^{2}}}$

Example

<layer ... type="Normalize" ... >
    <data across_spatial="0" channel_shared="0" eps="0.000000"/>
    <input> ... </input>
    <output> ... </output>
</layer>

Tile Layer

Name: Tile

Short description: Tile layer extends input blob with copies of data along specific axis.

Detailed description: Reference

Parameters: Tile layer parameters should be specified as the tile_data node, which is a child of the layer node.

Parameter name: axis
- Description: axis is the index of the axis to tile. For example, axis equals 3 means that fourth axis is used for tiling.
- Range of values: positive integer number
Parameter name: tiles
- Description: tiles is a size of the specified axis in the output blob. For example, tiles equal 88 means that output blob gets 88 copies of data from specified axis.
- Range of values: positive integer number

Mathematical Formulation

Tile extends input blobs and filling in output blobs following rules:

$out_i=input_i[inner\_dim*t]$

$t \in \left ( 0, \quad tiles \right )$

Example

<layer ... type="Tile" ... >
    <tile_data axis="3" tiles="88"/>
    <input> ... </input>
    <output> ... </output>
</layer>

Permute Layer

Name: Permute

Short description: Permute layer performs reordering of input blob dimensions.

Detailed description: Reference

Parameters: Permute layer parameters should be specified as the data node, which is a child of the layer node.

NOTE: Model Optimizer (Beta 2) does not use the data node for retrieving parameters and currently supports only the following order for permutation: 0,2,3,1.

Parameter name: order
- Description: order is the set of dimensions indexes for output blob. For example, order equal 0,2,3,1 means that the output blob has following dimensions: first dimension from the input blob, third dimension from the input blob, fourth dimension from the input blob, second dimension from the input blob.
- Range of values: set of positive integer numbers separated by comma

Mathematical Formulation

Permute layer performs reordering input blob. Source indexes and destination indexes are bound by formula:

$src\_ind_{offset} = n * ordered[1] * ordered[2] * ordered[3] + (h * ordered[3] + w)$

$n \in ( 0, order[0] )$

$h \in ( 0, order[2] )$

$w \in ( 0, order[3] )$

Example

<layer ... type="Permute" ... >
    <data order="0,2,3,1"/>
    <input> ... </input>
    <output> ... </output>
</layer>

PriorBox Layer

Name: PriorBox

Short description: PriorBox layer generates prior boxes of specified sizes and aspect ratios across all dimensions.

Parameters: PriorBox layer parameters should be specified as the data node, which is a child of the layer node.

Parameter name: min_size (max_size)
- Description: min_size (max_size) is the minimum (maximum) box size (in pixels). For example, min_size (max_size) equal 15 means that the minimum (maximum) box size is 15.
- Range of values: positive integer number
Parameter name: aspect_ratio
- Description: aspect_ratio is a variance of aspect ratios. Duplicate values are ignored. For example, aspect_ratio equal 2.000000,3.000000 means that for the first box aspect_ratio is equal to 2 and for the second box - 3.
- Range of values: set of positive integer numbers
Parameter name: flip
- Description: flip is a flag that denotes that each aspect_ratio is duplicated and flipped. For example, flip equals 1 and aspect_ratio equals 3 mean that aspect_ratio is equal to 1/3.
- Range of values:
  - 0 - each aspect_ratio is flipped
  - 1 - each aspect_ratio is not flipped
Parameter name: clip
- Description: clip is a flag that denotes if each value in the output blob is within [0,1]. For example, clip equal 1 means that each value in the output blob is within [0,1].
- Range of values:
  - 0 - clipping is not performed
  - 1 - each value in the output blob is within [0,1]
Parameter name: step
- Description: step is a distance between box centers. For example, step equal 85 means that the distance between neighborhood prior boxes centers is 85.
- Range of values: floating point positive number
Parameter name: offset
- Description: offset is a shift of box respectively to top left corner. For example, offset equal 85 means that the shift of neighborhood prior boxes centers is 85.
- Range of values: floating point positive number
Parameter name: variance
- Description: variance denotes a variance of adjusting bounding boxes. For example, variance equals 85 means that the shift of neighborhood prior boxes centers is 85.
- Range of values: floating point positive number
Parameter name: scale_all_sizes
- Description: scale_all_sizes is a flag that denotes type of inference. For example, scale_all_sizes equals 0 means that priorbox layer is inferd in MXNet-like manner. In particular, max_size parameter is ignored.
- Range of values:
  - 0 - max_size is ignored
  - 1 - default value. max_size is used

Mathematical Formulation: PriorBox computes coordinates of prior boxes by following:

First calculates center_x and center_y of prior box:

$W \equiv Width \quad Of \quad Image$

$H \equiv Height \quad Of \quad Image$
- If step equals 0:
  $center_x=(w+0.5)$
$center_y=(h+0.5)$
- else:
  $center_x=(w+offset)*step$
$center_y=(h+offset)*step$

$w \subset \left( 0, W \right )$

$h \subset \left( 0, H \right )$
Then, for each $s \subset \left( 0, min_sizes \right )$ calculates coordinates of priorboxes:
$xmin = \frac{\frac{center_x - s}{2}}{W}$

$ymin = \frac{\frac{center_y - s}{2}}{H}$

$xmax = \frac{\frac{center_x + s}{2}}{W}$

$ymin = \frac{\frac{center_y + s}{2}}{H}$

Example

<layer ... type="PriorBox" ... >
    <data step="64.000000" min_size="162.000000" max_size="213.000000" offset="0.500000" flip="1" clip="0" aspect_ratio="2.000000,3.000000" variance="0.100000,0.100000,0.200000,0.200000" />
    <input> ... </input>
    <output> ... </output>
</layer>

SimplerNMS layer

Name: SimplerNMS

Short description: SimplerNMS layer performs filtering of bounding boxes and outputs only those with the highest confidence of prediction.

Parameters: SimplerNMS layer parameters should be specified as the data node, which is a child of the layer node.

Parameter name: pre_nms_topn (post_nms_topn)
- Description: pre_nms_topn (post_nms_topn) is the quantity of bounding boxes before (after) applying NMS operation. For example, pre_nms_topn (post_nms_topn) equals 15 means that the minimum (maximum) box size is 15.
- Range of values: positive integer number
Parameter name: cls_threshold
- Description: cls_threshold is the minimum value of the proposal to be taken into consideration. For example, cls_threshold equal 0.5 means that all boxes with prediction probability less than 0.5 are filtered out.
- Range of values: positive floating point number
Parameter name: iou_threshold
- Description: iou_threshold is the minimum ratio of boxes overlapping to be taken into consideration. For example, iou_threshold equal 0.7 means that all boxes with overlapping ratio less than 0.7 are filtered out.
- Range of values: positive floating point number
Parameter name: feat_stride
- Description: feat_stride is the step size to slide over boxes (in pixels). For example, feat_stride equal 16 means that all boxes are analyzed with the slide 16.
- Range of values: positive integer number
Parameter name: min_bbox_size
- Description: min_bbox_size is the minimum size of box to be taken into consideration. For example, min_bbox_size equal 35 means that all boxes with box size less than 35 are filtered out.
- Range of values: positive integer number
Parameter name: scale
- Description: scale is array of scales for anchor boxes generating.
- Range of values: positive integer number

Mathematical Formulation

SimplerNMS accepts three inputs with four dimensions. Produced blob has two dimensions, the first one equals post_nms_topn.

SimplerNMS does the following with the input blob:

Generates initial anchor boxes. Left top corner of all boxes is (0, 0). Width and height of boxes are calculated based on scaled (according to the scale parameter) default widths and heights
For each point in the first input blob:
- pins anchor boxes to picture according to the second input blob, which contains four deltas for each box: for x and y of center, for width, and for height
- finds out score in the first input blob
Filters out boxes with size less than min_bbox_size.
Sorts all proposals (box, score) by score from highest to lowest
Takes top pre_nms_topn proposals
Calculates intersections for boxes and filters out all with $intersection/union > iou\_threshold$
Takes top post_nms_topn proposals
Returns top proposals

Example

<layer ... type="SimplerNMS" ... >
    <data cls_threshold="0.500000" iou_threshold="0.700000" min_bbox_size="16" feat_stride="16" pre_nms_topn="6000" post_nms_topn="150"/>
    <input> ... </input>
    <output> ... </output>
</layer>

DetectionOutput Layer

Name: DetectionOutput

Short description: DetectionOutput layer performs non-maximum suppression to generate the detection output using information on location and confidence predictions.

Detailed description: Reference

Parameters: DetectionOutput layer parameters should be specified as the data node, which is a child of the layer node.

Parameter name: num_classes
- Description: number of classes to be predicted
- Range of values: positive integer values
Parameter name: background_label_id
- Description: background label id. If there is no background class, set it to -1.
- Range of values: integer values
Parameter name: top_k
- Description: maximum number of results to be kept on NMS stage
- Range of values: integer values
Parameter name: variance_encoded_in_target
- Description: if "true", variance is encoded in target. Otherwise, we need to adjust the predicted offset accordingly.
- Range of values: logical values
Parameter name: keep_top_k
- Description: number of total bboxes to be kept per image after NMS step. -1 means keeping all bboxes after NMS step.
- Range of values: integer values
Parameter name: num_orient_classes
- Range of values: integer values
Parameter name: code_type
- Description: type of coding method for bounding boxes
- Range of values: caffe.PriorBoxParameter.CENTER_SIZE and others
Parameter name: share_location
- Description: bounding boxes are shared among different classes.
- Range of values: logical values
Parameter name: interpolate_orientation
- Range of values: integer values
Parameter name: nms_threshold
- Description: threshold to be used in NMS stage
- Range of values: floating point values
Parameter name: confidence_threshold
- Description: only consider detections whose confidences are larger than a threshold. If not provided, consider all boxes.
- Range of values: floating point values

Mathematical Formulation

At each feature map cell, DetectionOutput predicts the offsets relative to the default box shapes in the cell, as well as the per-class scores that indicate the presence of a class instance in each of those boxes. Specifically, for each box out of k at a given location, DetectionOutput computes class scores and the four offsets relative to the original default box shape. This results in a total of $(c + 4)k$ filters that are applied around each location in the feature map, yielding $(c + 4)kmn$ outputs for a m × n feature map.

Example

<layer ... type="DetectionOutput" ... >
    <data num_classes="21" share_location="1" background_label_id="0" nms_threshold="0.450000" top_k="400" eta="1.000000" output_directory="" output_name_prefix="" output_format="" label_map_file="" name_size_file="" num_test_image="0" prob="1.000000" resize_mode="caffe.ResizeParameter.WARP" height="0" width="0" height_scale="0" width_scale="0" pad_mode="caffe.ResizeParameter.CONSTANT" pad_value="#" interp_mode="#" code_type="caffe.PriorBoxParameter.CENTER_SIZE" variance_encoded_in_target="0" keep_top_k="200" confidence_threshold="0.010000" visualize="0" visualize_threshold="0.000000" save_file=""/>
    <input> ... </input>
    <output> ... </output>
</layer>

Memory / Delay Object layer

Name: Memory

Short description: Memory layer represents delay layer in terms of LSTM terminology. To read more about LSTM topologies please refer this link.

Detailed description: Memory layer saves state between two infer requests. In the topology, it is the single layer, however, in the Intermediate Representation, it is always represented as a pair of Memory layers. One of these layers does not have outputs and another does not have inputs (in terms of the Intermediate Representation).

Parameters: Memory layer parameters should be specified as the data node, which is a child of the layer node.

Parameter name: id
- Description: id is the id of the pair of Memory layers. For example, id equals r_27-28 means that layers with id 27 and 28 are in one pair.
- Range of values: positive integer number
Parameter name: index
- Description: index represents if the given layer is input or output. For example, index equal 0 means this layer is output one.
- Range of values:
  - 0 - current layer is output one
  - 1 - current layer is input one
Parameter name: size
- Description: size represents the size of the group. For example, size equals 2 means this group is a pair.
- Range of values: only 2 is supported

Mathematical Formulation Memory save data from the input blob.

Example

<layer ... type="Memory" ... >
    <data id="r_27-28" index="0" size="2" />
    <input> ... </input>
    <output> ... </output>
</layer>

Clamp Layer

Name: Clamp

Short description: Clamp layer represents clipping activation operation.

Detailed description: Reference

Parameters: Clamp layer parameters should be specified as the data node, which is a child of the layer node.

Parameter name: min
- Description: min is the lower bound of values in the output shape. Any value in the input shape that is smaller than the bound, is replaced by the min value. For example, min equal 10 means that any value in the input shape that is smaller than the bound, is replaced by 10.
- Range of values: positive integer number
Parameter name: max
- Description: max is the upper bound of values in the output shape. Any value in the input shape that is greater than the bound, is replaced by the max value. For example, max equals 50 means that any value in the input shape that is greater than the bound, is replaced by 50.
- Range of values: positive integer number

Mathematical Formulation

Clamp generally does the following with the input blobs:

$out_i=\left\{\begin{array}{ll} max\_value \quad \mbox{if } \quad input_i>max\_value \\ min\_value \quad \mbox{if } \quad input_i \end{array}\right.$

Example

<layer ... type="Clamp" ... >
    <data min="10" max="50" />
    <input> ... </input>
    <output> ... </output>
</layer>

ArgMax Layer

Name: ArgMax

Short description: ArgMax layer compute the index of the K maximum values for each datum across all dimensions CxHxW.

Detailed description: Intended for use after a classification layer to produce a prediction. If parameter out_max_val is set to "true", output is a vector of pairs *(max_ind, max_val)* for each image. The axis parameter specifies an axis along which to maximize.

Parameters: ArgMax layer parameters should be specified as the data node, which is a child of the layer node.

Parameter name: top_k
- Description: number K of maximum items to output
- Range of values: positive integer number
**Parameter name**: out_max_val
- Description: if out_max_val equals 1, output is a vector of pairs *(max_ind, max_val)*, unless axis is set. Then output is max_val along the specified axis.
- Range of values: 0 or 1
Parameter name: axis
- Description: if set, maximizes along the specified axis, else maximizes the flattened trailing dimensions for each index of the first / num dimension.
- Range of values: integer values

Mathematical Formulation

ArgMax generally does the following with the input blobs:

$o_{i} = \left\{ x| x \in S \wedge \forall y \in S : f(y) \leq f(x) \right\}$

Example

<layer ... type="ArgMax" ... >
    <data top_k="10" out_max_val="1" axis="-1"/>
    <input> ... </input>
    <output> ... </output>
</layer>

PSROIPooling Layer

Name: PSROIPooling

Short description: PSROIPooling layer compute position-sensitive max pooling on regions of interest specified by input, takes as input N position-sensitive score maps and a list of R regions of interest.

Detailed description: Reference

Parameters: PSRoiPooling layer parameters should be specified as the data node, which is a child of the layer node.

Parameter name: output_dim
- Description: pooled output channel number
- Range of values: positive integer number
Parameter name: group_size
- Description: number of groups to encode position-sensitive score maps
- Range of values: positive integer number
Parameter name: spatial_scale
- Description: multiplicative spatial scale factor to translate ROI coordinates from their input scale to the scale used when pooling
- Range of values: positive floating point value

Mathematical Formulation

The output value for $(i, j)$ -th bin is obtained by summation from one score map $x_{i,j}$ corresponding to that bin. In short, the difference from RoIPooling is that a general feature map $x$ is replaced by a specific positive-sensitive score map $x_{i,j}$ .

Example

<layer ... type="PSROIPooling" ... >
    <data output_dim="10" out_max_val="1" spatial_scale="0.1"/>
    <input> ... </input>
    <output> ... </output>
</layer>

GRN Layer

Name: GRN

Short description: GRN is Global Response Normalization with L2 norm (across channels only).

Parameters: GRN layer parameters should be specified as the data node, which is a child of the layer node.

Parameter name: bias
Description: bias is added to the variance.
Range of values: floating point value

Mathematical Formulation

GRN computes L2 norm by channels for input blob. GRN generally does the following with the input blob:

$output_{i} = \frac{input_{i}}{\sqrt{\sum_{i}^{C} input_{i}}}$

Example

<layer ... type="GRN" ... >
    <data bias="1.0"/>
    <input> ... </input>
    <output> ... </output>
</layer>

PReLU Layer

Name: PReLU

Short description: PReLU is the Parametric Rectifier Linear Unit. The difference from ReLU is that negative slopes can vary across channels.

Parameters: PReLU layer parameters should be specified as the data node, which is a child of the layer node.

Parameter name: channel_shared
- Description: channel_shared shows if negative slope shared across channels or not.
- Range of values: 0 or 1
Parameter name: filler_type
- Description: filler_type defines initialization type for negative slope.
- Range of values: string
Parameter name: filler_value
- Description: filler_value defines the value in constant filler.
- Range of values: integer
Parameter name: min(max)
- Description: min(max) defines the minimal(maximal) value in uniform filler.
- Range of values: integer
Parameter name: mean
- Description: mean defines the mean value in Gaussian filler.
- Range of values: integer

Mathematical Formulation

PReLU accepts one input with four dimensions. The produced blob has the same dimensions as input.

PReLU does the following with the input blob:

$o_{i} = max(0, x_{i}) + w_{i} * min(0,x_{i})$

where $w_{i}$ is from weights blob.

Example

<layer ... type="PReLU" ... >
    <data bias="1.0"/>
    <input> ... </input>
    <output> ... </output>
</layer>

RegionYolo layer

Name: RegionYolo

Short description: RegionYolo computes coordinates of regions with probability for each class.

Detailed description: Reference

Parameters: RegionYolo layer parameters should be specified as the data node, which is a child of the layer node.

Parameter name: coords
- Description: coords is num coordinates for each region
- Range of values: integer value
Parameter name: classes
- Description: classes is num classes for each region
- Range of values: integer value
Parameter name: num
- Description: num is number of regions
- Range of values: integer value
Parameter name: do_softmax
- Description: do_softmax is a flag which specifies the method of infer
- Range of values:
  - 0 - softmax is not performed
  - 1 - softmax is performed
Parameter name: anchors
- Description: anchors coordinates regions
- Range of values: floating point values
Parameter name: mask
- Description: mask specifies which anchors to use
- Range of values: integer values
Parameter name: mask
- Description: mask specifies which anchors to use
- Range of values: integer values
Parameter name: axis
- Description: axis is the number of the dimension from which flattening is performed. For example, axis equals 1 means that flattening is started from the 1st dimension.
- Range of values: positive number greater or equal to 0
Parameter name: end_axis
- Description: end_axis is the number of the dimension on which flattening is ended. For example, end_axis equals -1 means that flattening is performed till the last dimension.
- Range of values: positive number greater or equal to 0

Mathematical formulation

RegionYolo calculates coordinates of regions by the rule:

$p_{0,0}^i=b*o^i+n*w*h*(coords^i + classes + 1) + w*h+loc$

$p_{1,1}^i=b*o^i+n*w*h*(coords^i + classes + 1) + coords*w*h+loc$

where:

i is number of regions

w and h are dimensions of image

$ location=w*h*i $

coords and classes are attributes of this layer

b is bacth

$loc = \frac{location}{w*h}$

For each region, RegionYolo calculates probability by probability:

$p^i = \frac{1}{1+e^-i}$

Example

<layer ... type="RegionYolo" ... >
    <data bias="1.0"/>
    <input> ... </input>
    <output> ... </output>
    <weights .../>
</layer>

ReorgYolo layer

Name: ReorgYolo

Short description: ReorgYolo reorganizes input blob taking into account strides.

Detailed description: Reference

Parameters: ReorgYolo layer parameters should be specified as the data node, which is a child of the layer node.

Parameter name: stride
- Description: stride is distance of cut throws in output blobs.
- Range of values: integer values

Mathematical formulation

RegionYolo reorganized the blob.

Destination index of the data calculates the following rules:

$DistIndex=b * IC * IH * IW + ic * IH * IW + ih * IW + iw$

Source index of the data calculates the following rules:

$SrcIndex=b * off_{ic} * off_{ih} * off_{iw} + C^o * off_{ih} * off_{iw} + H^o * off_{iw} + W^o;$

where:

$C^o=C^i \pmod{\frac{IC}{stride^2}}$

$W^o=W^i*stride + off_{ic}\pmod{stride}$

$H^o=H^i*stride + \frac{C^i}{\frac{IC}{stride^2}} /{stride}$

$off_{ic}=\frac{C^i}{\frac{IC}{stride^2}}$

$off_{ih}=IH*stride$

$off_{iw}=IW*stride$

$ic \subset \left( 0, IC \right )$

$iw \subset \left( 0, IW \right )$

$ih \subset \left( 0, IH \right )$

Example

<layer ... type="ReorgYolo" ... >
    <data stride="1"/>
    <input> ... </input>
    <output> ... </output>
</layer>

PriorBoxClustered Layer

Name: PriorBoxClustered

Short description: PriorBoxClustered layer generates prior boxes of specified sizes.

Parameters: PriorBoxClustered layer parameters should be specified as the data node, which is a child of the layer node.

Parameter name: width (height)
- Description: width (height) is a parameter that specifies desired boxes widths (heights) in pixels.
- Range of values: floating point positive number
Parameter name: clip
- Description: clip is a flag that denotes if each value in the output blob is within [0,1]. For example, clip equal 1 means that each value in the output blob is within [0,1].
- Range of values:
  - 0 - clipping is not performed
  - 1 - each value in the output blob is within [0,1]
Parameter name: flip
- Description: flip is a flag that denotes whether the list of boxes is augmented with the flipped ones.
- Range of values:
  - 0 - list of boxes is not augmented with the flipped ones
  - 1 - list of boxes is augmented with the flipped ones
Parameter name: step (step_w, step_h)
- Description: step (step_w, step_h) is a distance between box centers. For example, step equal 85 means that the distance between neighborhood prior boxes centers is 85.
- Range of values: floating point positive number
Parameter name: offset
- Description: offset is a shift of box respectively to top left corner. For example, offset equal 85 means that the shift of neighborhood prior boxes centers is 85.
- Range of values: floating point positive number
Parameter name: variance
- Description: variance denotes a variance of adjusting bounding boxes. For example, variance equal 85 means that the shift of neighborhood prior boxes centers is 85.
- Range of values: floating point positive number
Parameter name: img_h (img_w)
- Description: img_h (img_w) specifies height (width) of input image. These parameters are calculated unless provided explicitly.
- Range of values: floating point positive number

Mathematical Formulation

PriorBoxClustered computes coordinates of prior boxes by following:

Calculates the center_x and center_y of prior box:

$W \equiv Width \quad Of \quad Image$

$H \equiv Height \quad Of \quad Image$

$center_x=(w+offset)*step$

$center_y=(h+offset)*step$

$w \subset \left( 0, W \right )$

$h \subset \left( 0, H \right )$
For each $s \subset \left( 0, W \right )$ calculates the prior boxes coordinates:

$xmin = \frac{center_x - \frac{width_s}{2}}{W}$

$ymin = \frac{center_y - \frac{height_s}{2}}{H}$

$xmax = \frac{center_x - \frac{width_s}{2}}{W}$

$ymax = \frac{center_y - \frac{height_s}{2}}{H}$

If clip is defined, the coordinates of prior boxes are recalculated with the formula: $coordinate = \min(\max(coordinate,0), 1)$

Example

<layer ... type="PriorBoxClustered">
    <data clip="0" flip="0" height="44.0,10.0,30.0,19.0,94.0,32.0,61.0,53.0,17.0" offset="0.5" step="16.0" variance="0.1,0.1,0.2,0.2"
     width="86.0,13.0,57.0,39.0,68.0,34.0,142.0,50.0,23.0"/>
    <input>
        ...
    </input>
    <output>
        ...
    </output>
</layer>

MVN Layer

Name: MVN

Short description: Reference

Parameters: MVN layer parameters should be specified as the data node, which is a child of the layer node.

Parameter name: across_channels
- Description: across_channels is a flag that denotes if mean values are shared across channels. For example, across_channels equal 0 means that mean values are not shared across channels.
- Range of values:
  - 0 - mean values are not shared across channels
  - 1 - mean values are shared across channels
Parameter name: normalize_variance
- Description: normalize_variance is a flag that denotes whether to perform variance normalization.
- Range of values:
  - 0 - variance normalization is not performed
  - 1 - variance normalization is performed
Parameter name: eps
- Description: eps is the number to be added to the variance to avoid division by zero when normalizing the value. For example, epsilon equal 0.001 means that 0.001 is added to the variance.
- Range of values: positive floating point number

Mathematical Formulation

MVN subtracts mean from the input blob:

$o_{i} = i_{i} - \frac{\sum{i_{k}}}{C * H * W}$

If normalize_variance is set to 1, the output blob is divided by variance:

$o_{i}=\frac{o_{i}}{\sum \sqrt {o_{k}^2}+\epsilon}$

Example

<layer ... type="MVN">
    <data across_channels="1" eps="9.999999717180685e-10" normalize_variance="1"/>
    <input>
        ...
    </input>
    <output>
        ...
    </output>
</layer>

CTCGreadyDecoder Layer

Name: CTCGreadyDecoder

Short description: CTCGreadyDecoder performs greedy decoding on the logits given in input (best path).

Detailed description: Reference

Parameters: CTCGreadyDecoder layer parameters should be specified as the data node, which is a child of the layer node.

Parameter name: ctc_merge_repeated
- Description: ctc_merge_repeated is a flag for collapsing the repeated labels during the ctc calculation.
- Range of values: 0 or 1

Mathematical Formulation

Given an input sequence $X$ of length $T$ , CTCGreadyDecoder assumes the probability of a length $T$ character sequence $C$ is given by

$p(C|X) = \prod_{t=1}^{T} p(c_{t}|X)$

Example

<layer ... type="CTCGreadyDecoder" ... >
    <data stride="1"/>
    <input> ... </input>
    <output> ... </output>
</layer>

Proposal Layer

Name: Proposal

Short description: Proposal layer performs filtering of only those bounding boxes and outputs with the highest confidence of prediction.

Parameters: Proposal layer parameters should be specified as the data node, which is a child of the layer node.

Parameter name: pre_nms_topn (post_nms_topn)
- Description: pre_nms_topn (post_nms_topn) is the quantity of bounding boxes before (after) applying NMS operation. For example, pre_nms_topn (post_nms_topn) equal 15 means that the minimum (maximum) box size is 15.
- Range of values: positive integer number
Parameter name: nms_thresh
- Description: nms_thresh is the minimum value of the proposal to be taken into consideration. For example, nms_thresh equal 0.5 means that all boxes with prediction probability less than 0.5 are filtered out.
- Range of values: positive floating point number
Parameter name: feat_stride
- Description: feat_stride is the step size to slide over boxes (in pixels). For example, feat_stride equal 16 means that all boxes are analyzed with the slide 16.
- Range of values: positive integer number
Parameter name: min_size
- Description: min_size is the minimum size of box to be taken into consideration. For example, min_size equal 35 means that all boxes with box size less than 35 are filtered out.
- Range of values: positive integer number
Parameter name: base_size
- Description: base_size is the base size for anchor generation.
- Range of values: positive integer number
Parameter name: ratio
- Description: ratio is the ratios for anchor generation.
- Range of values: array of float numbers
Parameter name: scale
- Description: scale is the scales for anchor generation.
- Range of values: array of float numbers

Mathematical Formulation

Proposal layer accepts three inputs with four dimensions. The produced blob has two dimensions: first one equals batch_size * post_nms_topn.

Proposal does the following with the input blob:

Generates initial anchor boxes Left top corner of all boxes in (0, 0). Width and height of boxes are calculated from base_size with scale and ratio parameters
For each point in the first input blob:
- pins anchor boxes to the image according to the second input blob that contains four deltas for each box: for x and y of center, for width and for height
- finds out score in the first input blob
Filters out boxes with size less than min_size
Sorts all proposals (box, score) by score from highest to lowest
Takes top pre_nms_topn proposals
Calculates intersections for boxes and filter out all with $intersection/union > nms\_thresh$
Takes top post_nms_topn proposals
Returns top proposals

Example

<layer ... type="Proposal" ... >
    <data base_size="16" feat_stride="16" min_size="16" nms_thresh="0.6" post_nms_topn="200" pre_nms_topn="6000"
     ratio="2.67" scale="4.0,6.0,9.0,16.0,24.0,32.0"/>
    <input> ... </input>
    <output> ... </output>
</layer>

Resample Layer

Name: Resample

Short description: Resample layer scales the input blob by the specified parameters.

Parameters: Resample layer parameters should be specified as the data node, which is a child of the layer node.

Parameter name: type
- Description: type parameter specifies type of blob interpolation.
- Range of values:
  - LINEAR - linear blob interpolation
  - CUBIC - cubic blob interpolation
  - NEAREST - nearest-neighbor blob interpolation
Parameter name: antialias
- Description: antialias is a flag that denotes whether to perform anti-aliasing.
- Range of values:
  - 0 - anti-aliasing is not performed
  - 1 - anti-aliasing is performed

Mathematical formulation

Resample layer scales the input blob. Depending on the type parameter, Resample applies different blob interpolation algorithms and performs anti-aliasing if the antialias parameter is specified.

Example

<layer type="Resample">
  <data antialias="0" factor="1.0" height="227" type="caffe.ResampleParameter.LINEAR" width="227"/>
      <input>
      ...
      </input>
      <output>
      ...
      </output>
​</layer>

Power Layer

Name: Power

Short description: Power layer computes the output as (shift + scale * x) ^ power for each input element x.

Parameters: Power layer parameters should be specified as the data node, which is a child of the layer node.

Parameter name: power
- Description: power represents the power parameter.
Parameter name: scale
- Description: scale represents the scaling parameter.
Parameter name: shift
- Description: shift represents the shifting parameter.

Mathematical Formulation

$p = (shift + scale * x)^{power}$

Example

<layer ... type="Power" ... >
    <data power="2" scale="0.1" shift="5"/>
    <input> ... </input>
    <output> ... </output>
</layer>

Pad Layer

Name: Pad

Short description: Pad layer extends an input tensor on edges. New element values are generated based on the Pad layer parameters described below.

Parameters: Pad layer parameters should be specified in the data section, which is placed as a child of the layer node. The parameters specify a number of elements to added along each axis and a rule by which new element values are generated: for example, whether they are filled with a given constant or generated based on the input tensor content.

Parameter name: pads_begin
- Description: A number of padding elements at the beginning of each axis. Required.
- Range of values: A list of non-negative integers. The length of the list must be equal to the number of dimensions in the input tensor.
Parameter name: pads_end
- Description: A number of padding elements at the end of each axis. Required.
- Range of values: A list of non-negative integers. The length of the list must be equal to the number of dimensions in the input tensor.
Parameter name: pad_mode
- Description: A method used to generate new element values. Required.
- Range of values: Name of the method in string format:
  - constant: Padded values are equal to the value of the pad_value layer parameter.
  - edge: Padded values are copied from the respective edge of the input tensor.
  - reflect: Padded values are a reflection of the input tensor; values on the edges are not duplicated. pads_begin[D] and pads_end[D] must be not greater than input.shape[D] – 1 for any valid D.
  - symmetric: Padded values are symmetrically added from the input tensor. This method is similar to the reflect, but values on edges are duplicated. Refer to the examples below for more details. pads_begin[D] and pads_end[D] must be not greater than input.shape[D] for any valid D.
Parameter name: pad_value
- Description: Applicable for the pad_mode = "constant" only. All new elements are filled with this value. Optional, default value is 0.
- Range of values: An arbitrary floating point value.

Inputs

1: Multidimensional input blob

Outputs

1: Multidimensional input blob with dimensions pads_begin[D] + input.shape[D] + pads_end[D] for each D from 0 to len(input.shape) - 1.

pad_mode Examples

The following examples illustrate how output tensor is generated for the Pad layer for a given input tensor:

INPUT =
    [[ 1  2  3  4 ]
    [  5  6  7  8 ]
    [  9 10 11 12 ]]

with the following parameters:

pads_begin = [0, 1]

pads_end = [2, 3]

depending on the pad_mode.

pad_mode = "constant":

OUTPUT =

[[ 0 1 2 3 4 0 0 0 ]

[ 0 5 6 7 8 0 0 0 ]

[ 0 9 10 11 12 0 0 0 ]

[ 0 0 0 0 0 0 0 0 ]

[ 0 0 0 0 0 0 0 0 ]]
pad_mode = "edge":

OUTPUT =

[[ 1 1 2 3 4 4 4 4 ]

[ 5 5 6 7 8 8 8 8 ]

[ 9 9 10 11 12 12 12 12 ]

[ 9 9 10 11 12 12 12 12 ]

[ 9 9 10 11 12 12 12 12 ]]
pad_mode = "reflect":

OUTPUT =

[[ 2 1 2 3 4 3 2 1 ]

[ 6 5 6 7 8 7 6 5 ]

[ 10 9 10 11 12 11 10 9 ]

[ 6 5 6 7 8 7 6 5 ]

[ 2 1 2 3 4 3 2 1 ]]
pad_mode = "symmetric":

OUTPUT =

[[ 1 1 2 3 4 4 3 2 ]

[ 5 5 6 7 8 8 7 6 ]

[ 9 9 10 11 12 12 11 10 ]

[ 9 9 10 11 12 12 11 10 ]

[ 5 5 6 7 8 8 7 6 ]]

Example

<layer id="1" name="MyPad" precision="FP32" type="Pad">
    <data pads_begin="0,5,2,1" pads_end="1,0,3,7" pad_mode="constant" pad_value="666.0"/>
    <input>
        <port id="0">
            <dim>1</dim>
            <dim>3</dim>
            <dim>32</dim>
            <dim>40</dim>
        </port>
    </input>
    <output>
        <port id="2">
            <dim>2</dim>     
            <dim>8</dim>     
            <dim>37</dim>    
            <dim>48</dim>    
        </port>
    </output>
</layer>

LSTMCell Layer

Name: LSTMCell

Short description: LSTMCell layer computes the output using the formula described in original paper Long Short-Term Memory.

Parameters: None

Mathematical Formulation

inputs:
  X  - input data
  Hi - input hidden state
  Ci - input cell state
outputs:
  Ho - output hidden state
  Co - output cell state
Formula:
  *  - matrix mult
 (.) - eltwise mult
 [,] - concatenation
sigm - 1/(1 + e^{-x})
tanh - (e^{2x} - 1)/(e^{2x} + 1)
   f = sigm(Wf*[Hi, X] + Bf)
   i = sigm(Wi*[Hi, X] + Bi)
   c = tanh(Wc*[Hi, X] + Bc)
   o = sigm(Wo*[Hi, X] + Bo)
  Co = f (.) Ci + i (.) c
  Ho = o (.) tanh(Co)

Example

<layer ... type="LSTMCell" ... >
    <input> ... </input>
    <output> ... </output>
</layer>

TensorIterator Layer

Name: TensorIterator

Short description: TensorIterator (TI) layer performs recurrent subgraph execution iterating through the data.

Parameters: port_map and back_edges sections specifying data mapping rules:

port_map: A set of rules to map in/out data tensors of the TensorIterator layer on body data tensors

Port mapping rule is presented like input/output node with a set of rule attributes:
- external_port_id : port ID of the TensorIterator layer
- internal_layer_id : layer ID inside the body subnetwork to map to
- internal_port_id : port ID of the body layer to map to
- axis: an axis to iterate through. -1 means no iteration is done. Default value is -1.
- start : an index from where iteration starts. Negative value means index from the end. Default value is 0.
- end : an index where iterations ends. Negative value means index from the end. Default value is -1.
- stride : step of iteration. Negative value means backward iteration. Default value is 1.
back_edges: A set of rules to transfer data tensors between body iteration. Mapping rule is presented like general edge node with port indexes of body subnetwork.
body: A subnetwork that will be recurrently executed

Example

<layer ... type="Power" ... >
    <input> ... </input>
    <output> ... </output>
    <port_map>
        <input external_port_id="0" internal_layer_id="0" internal_port_id="0" axis="1" start="-1" end="0" stride="-1"/>
        <input external_port_id="1" internal_layer_id="1" internal_port_id="1"/>
        ...
        <output external_port_id="3" internal_layer_id="2" internal_port_id="1" axis="1" start="-1" end="0" stride="-1"/>
        ...
    </port_map>
    <back_edges>
        <edge from-layer="1" from-port="1" to-layer="1" to-port="1"/>
        ...
    </back_edges>
    <body>
        <layers> ... </layers>
        <edges> ... </edges>
    </body>
</layer>