Intermediate Representation Notation Reference Catalog 

Convolution Layer

Name: Convolution

Short description: Reference

Detailed description: Reference

Parameters: Convolution layer parameters should be specified in the convolution_data node, which is a child of the layer node.

Weights Layout Weights layout is GOIYX, which means that X is changing the fastest, then Y, then Input, Output, then Group.

Mathematical Formulation

Example

<layer ... type="Convolution" ... >
<convolution_data stride-x="4" stride-y="4" pad-x="0" pad-y="0" kernel-x="11" kernel-y="11" output="96" group="1" dilation-x="2" dilation-y="2"/>
<input> ... </input>
<output> ... </output>
<weights ... />
<biases ... />
</layer>

Gather Layer

Name: Gather

Short description: Gather layer takes slices of data in the second input blob according to the indices specified in the first input blob. The output blob shape is input2.shape[:axis] + input1.shape + input2.shape[axis + 1:].

Parameters: Gather layer parameters should be specified in the data section, which is placed as a child of the layer node.

Inputs

Mathematical Formulation

\[ output[:, ... ,:, i, ... , j,:, ... ,:] = input2[:, ... ,:, input1[i, ... ,j],:, ... ,:] \]

Example

<layer id="1" name="gather_node" precision="FP32" type="Gather">
<data axis=1 />
<input>
<port id="0">
<dim>15</dim>
<dim>4</dim>
<dim>20</dim>
<dim>28</dim>
</port>
<port id="1">
<dim>6</dim>
<dim>12</dim>
<dim>10</dim>
<dim>24</dim>
</port>
</input>
<output>
<port id="2">
<dim>6</dim>
<dim>15</dim>
<dim>4</dim>
<dim>20</dim>
<dim>28</dim>
<dim>10</dim>
<dim>24</dim>
</port>
</output>
</layer>

Pooling Layer

Name: Pooling

Short description: Reference

Detailed description: Reference

Parameters: Specify pooling layer parameters in the pooling_data node, which is a child of the layer node.

Mathematical Formulation

Example

<layer ... type="Pooling" ... >
<pooling_data kernel-x="3" kernel-y="3" pad-x="0" pad-y="0" stride-x="2" stride-y="2" pool-method="max" exclude-pad="true" rounding_type="floor"/>
<input> ... </input>
<output> ... </output>
</layer>

ROIPooling Layer

Name: ROIPooling

Short description: It is a pooling layer with max pooling strategy (see max option in the *Pooling layer* parameters description). It is used over feature maps of non-uniform sizes and outputs another feature map of a fixed size.

Detailed description: deepsense.io reference

Parameters: Specify ROIPooling layer parameters in the data node, which is a child of the layer node.

Mathematical Formulation

\[ output_{j} = MAX\{ x_{0}, ... x_{i}\} \]

Example

<layer ... type="ROIPooling" ... >
<data pooled_h="6" pooled_w="6" spatial_scale="0.062500"/>
<input> ... </input>
<output> ... </output>
</layer>

FullyConnected Layer

Name: FullyConnected

Short description: Reference

Detailed description: Reference

Parameters: Specify FullyConnected layer parameters in the fc_data node, which is a child of the layer node.

Weights Layout OI, which means that Input is changing the fastest, then Output.

Mathematical Formulation

Example

<layer ... type="FullyConnected" ... >
<fc_data out-size="4096"/>
<input> ... </input>
<output> ... </output>
</layer>

ReLU Layer

Name: ReLU

Short description: Reference

Detailed description: Reference

Parameters: ReLU layer parameters can be (not mandatory) specified in the data node, which is a child of the layer node.

Mathematical Formulation

\[ Y_{i}^{( l )} = max(0, Y_{i}^{( l - 1 )}) \]

Example

<layer ... type="ReLU" ... >
<data negative_slope="0.100000"/>
<input> ... </input>
<output> ... </output>
</layer>

Activation Layer

Name: Activation

Short description: Activation layer represents an activation function of each neuron in a layer, which is used to add non-linearity to the computational flow.

Detailed description: Reference

Parameters: Activation layer parameters should be specified in the data node, which is a child of the layer node.

Mathematical Formulation

Example

<layer ... type="Activation" ... >
<data type="sigmoid" />
<input> ... </input>
<output> ... </output>
</layer>

SoftMax layer

Name: SoftMax

Short description: Reference

Detailed description: Reference

Parameters: SoftMax layer parameters can be (not mandatory) specified in the data node, which is a child of the layer node.

Mathematical Formulation

\[ y_{c} = \frac{e^{Z_{c}}}{\sum_{d=1}^{C}e^{Z_{d}}} \]

where $C$ is a number of classes

Example

<layer ... type="SoftMax" ... >
<data axis="1" />
<input> ... </input>
<output> ... </output>
</layer>

Deconvolution Layer

Name: Deconvolution

Short description: Deconvolution layer is applied for upsampling the output to the higher image resolution.

Detailed description: Reference

Parameters: Deconvolution layer parameters should be specified in the deconvolution_data node, which is a child of the layer node.

Parameters: Convolution layer parameters should be specified in the convolution_data node, which is a child of the layer node.

Weights Layout Weights layout is the following: GOIYX, which means that X is changing the fastest, then Y, then Input, Output, then Group.

Mathematical Formulation

Deconvolution is also called transpose convolution and performs operation, reverse to convolution.

The number of output features for each dimensions is calculated:

\[S_{o}=stride(S_{i} - 1 ) + S_{f} - 2pad \]

Where $S$ is size of output, input and filter.

Output is calculated in the same way as for convolution layer:

\[out = \sum_{i = 0}^{n}w_{i}x_{i} + b\]

Example

<layer ... type="Deconvolution" ... >
<deconvolution_data stride-x="2" stride-y="2" pad-x="1" pad-y="1" kernel-x="4" kernel-y="4" output="19" group="1"/>
<input> ... </input>
<output> ... </output>
</layer>

Local Response Normalization (LRN) layer

Name: Norm

Short description: Reference

Detailed description: Reference

Parameters: Norm layer parameters should be specified in the norm_data node, which is a child of the layer node.

Mathematical Formulation

\[o_{i} = \left( 1 + \left( \frac{\alpha}{n} \right)\sum_{i}x_{i}^{2} \right)^{\beta}\]

Where $n$ is the size of each local region.

Example

<layer ... type="Norm" ... >
<norm_data alpha="9.9999997e-05" beta="0.75" local-size="5" region="across"/>
<input> ... </input>
<output> ... </output>
</layer>

Concat Layer

Name: Concat

Short description: Reference

Parameters: Concat layer parameters should be specified in the concat_data node, which is a child of the layer node.

Mathematical Formulation

Axis parameter specifies a blob dimension to concat values. For example, for two input blobs B1xC1xH1xW1 and B2xC2xh4xW2 if axis: 1, output blob is****: B1xC1+C2xH1xW1. This is only possible if B1=B2, H1=H4, W1=W2.

Example

<layer ... type="Concat" ... >
<concat_data axis="1"/>
<input> ... </input>
<output> ... </output>
</layer>

Split Layer

Name: Split

Short description: Split layer splits the input into several output groups. Group sizes are denoted by the number and the size of output ports.

Detailed description: Reference

Parameters: None

Mathematical Formulation

Splits input blob among children. For example, blob is BxC+CxHxW and there are two children. Then, output blob is BxCxHxW.

Example

<layer ... type="Split" ... >
<input> ... </input>
<output> ... </output>
</layer>

Reshape Layer

Name: Reshape

Short description: Reshape layer changes dimensions of the input blob according to the specified order. Input blob volume is equal to output blob volume, where volume is the product of dimensions.

Detailed description: Reference

Parameters: Reshape layer parameters should be specified in the data node, which is a child of the layer node.

Mathematical Formulation

If you want to reshape input blob BxCxHxW into Bx1x(C*H)xW, the dim parameters of your layer should be:

layer {
name: "reshape"
type: "Reshape"
bottom: "input"
top: "output"
reshape_param {
shape {
dim: 0 # copy the dimension from below
dim: 1
dim: -1 # infer it from the other dimensions
dim: 0
}
}
}

Example

<layer ... type="Reshape" ... >
<data axis="0" dim="1, 1001" num_axes="-1"/>
<input> ... </input>
<output> ... </output>
</layer>

Eltwise Layer

Name: Eltwise

Short description: Eltwise layer performs element-wise operation, which is specified in parameters, over given inputs.

Parameters: Eltwise layer parameters should be specified in the elementwise_data node, which is placed as a child of the layer node.

Mathematical Formulation Eltwise accepts 2 inputs of any number of dimensions - from 1 to 4, however, it is required for both of them to have absolutely same dimensions. The produced blob is also of the same dimension as each of its parents

Eltwise does the following with the input blobs:

\[ o_{i} = f(b_{i}^{1}, b_{i}^{2}) \]

where $b_{i}^{1}$ - first blob $i$-th element, $b_{i}^{2}$ - second blob $i$-th element, $o_{i}$ - output blob $i$-th element, $f(a, b)$ - is a function that performs an operation over its two arguments $a, b$.

Example

<layer ... type="Eltwise" ... >
<elementwise_data operation="sum"/>
<input> ... </input>
<output> ... </output>
</layer>

ScaleShift Layer

Name: ScaleShift

Short description: ScaleShift layer performs linear transformation of the input blobs. Weights denote scaling parameter, biases - a shift.

Parameters: ScaleShift layer does not have additional parameters.

Mathematical Formulation

\[ o_{i} =\gamma b_{i} + \beta \]

Example

<layer ... type="ScaleShift" ... >
<input> ... </input>
<output> ... </output>
</layer>

Crop (Type 1) Layer

Name: Crop

Short description: Crop layer changes selected dimensions of the input blob according to the specified parameters.

Parameters: Crop layer parameters should be specified in data section, which is placed as a child of the layer node. Due to various representation of Crop attributes in existing frameworks, this layer can be described in three independent ways: Crop Type 1 layer takes two input blobs, and the shape of the second blob specifies the Crop size. The layer has two attributes: axis and offset. Crop layer takes two input blobs, and the shape of the second blob specifies the Crop size. The Crop layer of this type supports shape inference.

Inputs

Example

<layer id="39" name="score_pool4c" precision="FP32" type="Crop">
<data axis="2,3" offset="0,0"/>
<input>
<port id="0">
<dim>1</dim>
<dim>21</dim>
<dim>44</dim>
<dim>44</dim>
</port>
<port id="1">
<dim>1</dim>
<dim>21</dim>
<dim>34</dim>
<dim>34</dim>
</port>
</input>
<output>
<port id="2">
<dim>1</dim>
<dim>21</dim>
<dim>34</dim>
<dim>34</dim>
</port>
</output>
</layer>

Crop (Type 2) Layer

Name: Crop

Short description: Crop layer changes selected dimensions of the input blob according to the specified parameters.

Parameters: Crop layer parameters should be specified in data section, which is placed as a child of the layer node. Due to various representation of Crop attributes in existing frameworks, this layer can be described in three independent ways: Crop Type 2 layer takes one input blob to Crop and has three attributes: axis, offset, and dim. Crop layer takes one input blob to Crop and has axis, offset, and dim attributes. The Crop layer of this type supports shape inference only when shape propagation is applied to dimensions that are not specified in the axis attribute.

Example

<layer id="39" name="score_pool4c" precision="FP32" type="Crop">
<data axis="2,3" offset="0,0" dim="34,34"/>
<input>
<port id="0">
<dim>1</dim>
<dim>21</dim>
<dim>44</dim>
<dim>44</dim>
</port>
</input>
<output>
<port id="1">
<dim>1</dim>
<dim>21</dim>
<dim>34</dim>
<dim>34</dim>
</port>
</output>
</layer>

Crop (Type 3) Layer

Name: Crop

Short description: Crop layer changes selected dimensions of the input blob according to the specified parameters.

Parameters: Crop layer parameters should be specified in data section, which is placed as a child of the layer node. Due to various representation of Crop attributes in existing frameworks, this layer can be described in three independent ways: Crop Type 3 layer takes one input blob to Crop and has three attributes: axis, crop_begin, and crop_end. Crop layer takes one input blob to Crop and has axis, crop_begin, and crop_end attributes. The Crop layer of this type supports shape inference.

Example

<layer id="39" name="score_pool4c" precision="FP32" type="Crop">
<data axis="2,3" crop_begin="4,4" crop_end="6,6"/>
<input>
<port id="0">
<dim>1</dim>
<dim>21</dim>
<dim>44</dim>
<dim>44</dim>
</port>
</input>
<output>
<port id="1">
<dim>1</dim>
<dim>21</dim>
<dim>34</dim>
<dim>34</dim>
</port>
</output>
</layer>

Batch Normalization Layer

Name: BatchNormalization

Short description: Reference

Detailed description: Reference

Parameters: BatchNormalization layer parameters should be specified as the batch_norm_data node, which is a child of the layer node.

Mathematical Formulation

BatchNormalization is the normalization of the output in each hidden layer.

Example

<layer ... type="BatchNormalization" ... >
<batch_norm_data epsilon="9.99e-06" />
<input> ... </input>
<output> ... </output>
</layer>

Normalize Layer

Name: Normalize

Short description: Normalize layer performs l-p normalization of 1 of input blob.

Parameters: Normalize layer parameters should be specified as the data node, which is a child of the layer node.

Mathematical Formulation

\[ o_{i} = \sum_{i}^{H*W}\frac{\left ( n*C*H*W \right )* scale}{\sqrt{\sum_{i=0}^{C*H*W}\left ( n*C*H*W \right )^{2}}} \]

Example

<layer ... type="Normalize" ... >
<data across_spatial="0" channel_shared="0" eps="0.000000"/>
<input> ... </input>
<output> ... </output>
</layer>

Tile Layer

Name: Tile

Short description: Tile layer extends input blob with copies of data along specific axis.

Detailed description: Reference

Parameters: Tile layer parameters should be specified as the tile_data node, which is a child of the layer node.

Mathematical Formulation

Tile extends input blobs and filling in output blobs following rules:

\[ out_i=input_i[inner\_dim*t] \]

\[ t \in \left ( 0, \quad tiles \right ) \]

Example

<layer ... type="Tile" ... >
<tile_data axis="3" tiles="88"/>
<input> ... </input>
<output> ... </output>
</layer>

Permute Layer

Name: Permute

Short description: Permute layer performs reordering of input blob dimensions.

Detailed description: Reference

Parameters: Permute layer parameters should be specified as the data node, which is a child of the layer node.

NOTE: Model Optimizer (Beta 2) does not use the data node for retrieving parameters and currently supports only the following order for permutation: 0,2,3,1.

Mathematical Formulation

Permute layer performs reordering input blob. Source indexes and destination indexes are bound by formula:

\[ src\_ind_{offset} = n * ordered[1] * ordered[2] * ordered[3] + (h * ordered[3] + w) \]

\[ n \in ( 0, order[0] ) \]

\[ h \in ( 0, order[2] ) \]

\[ w \in ( 0, order[3] ) \]

Example

<layer ... type="Permute" ... >
<data order="0,2,3,1"/>
<input> ... </input>
<output> ... </output>
</layer>

PriorBox Layer

Name: PriorBox

Short description: PriorBox layer generates prior boxes of specified sizes and aspect ratios across all dimensions.

Parameters: PriorBox layer parameters should be specified as the data node, which is a child of the layer node.

Mathematical Formulation: PriorBox computes coordinates of prior boxes by following:

  1. First calculates center_x and center_y of prior box:

    \[ W \equiv Width \quad Of \quad Image \]

    \[ H \equiv Height \quad Of \quad Image \]

    • If step equals 0:

      \[ center_x=(w+0.5) \]

    \[ center_y=(h+0.5) \]

    • else:

      \[ center_x=(w+offset)*step \]

    \[ center_y=(h+offset)*step \]

    \[ w \subset \left( 0, W \right ) \]

    \[ h \subset \left( 0, H \right ) \]

  2. Then, for each $ s \subset \left( 0, min_sizes \right ) $ calculates coordinates of priorboxes:

    \[ xmin = \frac{\frac{center_x - s}{2}}{W} \]

    \[ ymin = \frac{\frac{center_y - s}{2}}{H} \]

    \[ xmax = \frac{\frac{center_x + s}{2}}{W} \]

    \[ ymin = \frac{\frac{center_y + s}{2}}{H} \]

Example

<layer ... type="PriorBox" ... >
<data step="64.000000" min_size="162.000000" max_size="213.000000" offset="0.500000" flip="1" clip="0" aspect_ratio="2.000000,3.000000" variance="0.100000,0.100000,0.200000,0.200000" />
<input> ... </input>
<output> ... </output>
</layer>

SimplerNMS layer

Name: SimplerNMS

Short description: SimplerNMS layer performs filtering of bounding boxes and outputs only those with the highest confidence of prediction.

Parameters: SimplerNMS layer parameters should be specified as the data node, which is a child of the layer node.

Mathematical Formulation

SimplerNMS accepts three inputs with four dimensions. Produced blob has two dimensions, the first one equals post_nms_topn.

SimplerNMS does the following with the input blob:

  1. Generates initial anchor boxes. Left top corner of all boxes is (0, 0). Width and height of boxes are calculated based on scaled (according to the scale parameter) default widths and heights
  2. For each point in the first input blob:
    • pins anchor boxes to picture according to the second input blob, which contains four deltas for each box: for x and y of center, for width, and for height
    • finds out score in the first input blob
  3. Filters out boxes with size less than min_bbox_size.
  4. Sorts all proposals (box, score) by score from highest to lowest
  5. Takes top pre_nms_topn proposals
  6. Calculates intersections for boxes and filters out all with $intersection/union > iou\_threshold$
  7. Takes top post_nms_topn proposals
  8. Returns top proposals

Example

<layer ... type="SimplerNMS" ... >
<data cls_threshold="0.500000" iou_threshold="0.700000" min_bbox_size="16" feat_stride="16" pre_nms_topn="6000" post_nms_topn="150"/>
<input> ... </input>
<output> ... </output>
</layer>

DetectionOutput Layer

Name: DetectionOutput

Short description: DetectionOutput layer performs non-maximum suppression to generate the detection output using information on location and confidence predictions.

Detailed description: Reference

Parameters: DetectionOutput layer parameters should be specified as the data node, which is a child of the layer node.

Mathematical Formulation

At each feature map cell, DetectionOutput predicts the offsets relative to the default box shapes in the cell, as well as the per-class scores that indicate the presence of a class instance in each of those boxes. Specifically, for each box out of k at a given location, DetectionOutput computes class scores and the four offsets relative to the original default box shape. This results in a total of $(c + 4)k$ filters that are applied around each location in the feature map, yielding $(c + 4)kmn$ outputs for a m × n feature map.

Example

<layer ... type="DetectionOutput" ... >
<data num_classes="21" share_location="1" background_label_id="0" nms_threshold="0.450000" top_k="400" eta="1.000000" output_directory="" output_name_prefix="" output_format="" label_map_file="" name_size_file="" num_test_image="0" prob="1.000000" resize_mode="caffe.ResizeParameter.WARP" height="0" width="0" height_scale="0" width_scale="0" pad_mode="caffe.ResizeParameter.CONSTANT" pad_value="#" interp_mode="#" code_type="caffe.PriorBoxParameter.CENTER_SIZE" variance_encoded_in_target="0" keep_top_k="200" confidence_threshold="0.010000" visualize="0" visualize_threshold="0.000000" save_file=""/>
<input> ... </input>
<output> ... </output>
</layer>

Memory / Delay Object layer

Name: Memory

Short description: Memory layer represents delay layer in terms of LSTM terminology. To read more about LSTM topologies please refer this link.

Detailed description: Memory layer saves state between two infer requests. In the topology, it is the single layer, however, in the Intermediate Representation, it is always represented as a pair of Memory layers. One of these layers does not have outputs and another does not have inputs (in terms of the Intermediate Representation).

Parameters: Memory layer parameters should be specified as the data node, which is a child of the layer node.

Mathematical Formulation Memory save data from the input blob.

Example

<layer ... type="Memory" ... >
<data id="r_27-28" index="0" size="2" />
<input> ... </input>
<output> ... </output>
</layer>

Clamp Layer

Name: Clamp

Short description: Clamp layer represents clipping activation operation.

Detailed description: Reference

Parameters: Clamp layer parameters should be specified as the data node, which is a child of the layer node.

Mathematical Formulation

Clamp generally does the following with the input blobs:

\[ out_i=\left\{\begin{array}{ll} max\_value \quad \mbox{if } \quad input_i>max\_value \\ min\_value \quad \mbox{if } \quad input_i \end{array}\right. \]

Example

<layer ... type="Clamp" ... >
<data min="10" max="50" />
<input> ... </input>
<output> ... </output>
</layer>

ArgMax Layer

Name: ArgMax

Short description: ArgMax layer compute the index of the K maximum values for each datum across all dimensions CxHxW.

Detailed description: Intended for use after a classification layer to produce a prediction. If parameter out_max_val is set to "true", output is a vector of pairs *(max_ind, max_val)* for each image. The axis parameter specifies an axis along which to maximize.

Parameters: ArgMax layer parameters should be specified as the data node, which is a child of the layer node.

Mathematical Formulation

ArgMax generally does the following with the input blobs:

\[ o_{i} = \left\{ x| x \in S \wedge \forall y \in S : f(y) \leq f(x) \right\} \]

Example

<layer ... type="ArgMax" ... >
<data top_k="10" out_max_val="1" axis="-1"/>
<input> ... </input>
<output> ... </output>
</layer>

PSROIPooling Layer

Name: PSROIPooling

Short description: PSROIPooling layer compute position-sensitive max pooling on regions of interest specified by input, takes as input N position-sensitive score maps and a list of R regions of interest.

Detailed description: Reference

Parameters: PSRoiPooling layer parameters should be specified as the data node, which is a child of the layer node.

Mathematical Formulation

The output value for $(i, j)$-th bin is obtained by summation from one score map $x_{i,j}$ corresponding to that bin. In short, the difference from RoIPooling is that a general feature map $x$ is replaced by a specific positive-sensitive score map $x_{i,j}$.

Example

<layer ... type="PSROIPooling" ... >
<data output_dim="10" out_max_val="1" spatial_scale="0.1"/>
<input> ... </input>
<output> ... </output>
</layer>

GRN Layer

Name: GRN

Short description: GRN is Global Response Normalization with L2 norm (across channels only).

Parameters: GRN layer parameters should be specified as the data node, which is a child of the layer node.

Mathematical Formulation

GRN computes L2 norm by channels for input blob. GRN generally does the following with the input blob:

\[ output_{i} = \frac{input_{i}}{\sqrt{\sum_{i}^{C} input_{i}}} \]

Example

<layer ... type="GRN" ... >
<data bias="1.0"/>
<input> ... </input>
<output> ... </output>
</layer>

PReLU Layer

Name: PReLU

Short description: PReLU is the Parametric Rectifier Linear Unit. The difference from ReLU is that negative slopes can vary across channels.

Parameters: PReLU layer parameters should be specified as the data node, which is a child of the layer node.

Mathematical Formulation

PReLU accepts one input with four dimensions. The produced blob has the same dimensions as input.

PReLU does the following with the input blob:

\[ o_{i} = max(0, x_{i}) + w_{i} * min(0,x_{i}) \]

where $w_{i}$ is from weights blob.

Example

<layer ... type="PReLU" ... >
<data bias="1.0"/>
<input> ... </input>
<output> ... </output>
</layer>

RegionYolo layer

Name: RegionYolo

Short description: RegionYolo computes coordinates of regions with probability for each class.

Detailed description: Reference

Parameters: RegionYolo layer parameters should be specified as the data node, which is a child of the layer node.

Mathematical formulation

RegionYolo calculates coordinates of regions by the rule:

\[ p_{0,0}^i=b*o^i+n*w*h*(coords^i + classes + 1) + w*h+loc \]

\[ p_{1,1}^i=b*o^i+n*w*h*(coords^i + classes + 1) + coords*w*h+loc \]

where:

i is number of regions

w and h are dimensions of image

$ location=w*h*i $

coords and classes are attributes of this layer

b is bacth

$ loc = \frac{location}{w*h} $

For each region, RegionYolo calculates probability by probability:

\[ p^i = \frac{1}{1+e^-i} \]

Example

<layer ... type="RegionYolo" ... >
<data bias="1.0"/>
<input> ... </input>
<output> ... </output>
<weights .../>
</layer>

ReorgYolo layer

Name: ReorgYolo

Short description: ReorgYolo reorganizes input blob taking into account strides.

Detailed description: Reference

Parameters: ReorgYolo layer parameters should be specified as the data node, which is a child of the layer node.

Mathematical formulation

RegionYolo reorganized the blob.

Destination index of the data calculates the following rules:

\[ DistIndex=b * IC * IH * IW + ic * IH * IW + ih * IW + iw \]

Source index of the data calculates the following rules:

\[ SrcIndex=b * off_{ic} * off_{ih} * off_{iw} + C^o * off_{ih} * off_{iw} + H^o * off_{iw} + W^o; \]

where:

\[ C^o=C^i \pmod{\frac{IC}{stride^2}} \]

\[ W^o=W^i*stride + off_{ic}\pmod{stride} \]

\[ H^o=H^i*stride + \frac{C^i}{\frac{IC}{stride^2}} /{stride} \]

\[ off_{ic}=\frac{C^i}{\frac{IC}{stride^2}} \]

\[ off_{ih}=IH*stride \]

\[ off_{iw}=IW*stride \]

\[ ic \subset \left( 0, IC \right ) \]

\[ iw \subset \left( 0, IW \right ) \]

\[ ih \subset \left( 0, IH \right ) \]

Example

<layer ... type="ReorgYolo" ... >
<data stride="1"/>
<input> ... </input>
<output> ... </output>
</layer>

PriorBoxClustered Layer

Name: PriorBoxClustered

Short description: PriorBoxClustered layer generates prior boxes of specified sizes.

Parameters: PriorBoxClustered layer parameters should be specified as the data node, which is a child of the layer node.

Mathematical Formulation

PriorBoxClustered computes coordinates of prior boxes by following:

  1. Calculates the center_x and center_y of prior box:

    \[ W \equiv Width \quad Of \quad Image \]

    \[ H \equiv Height \quad Of \quad Image \]

    \[ center_x=(w+offset)*step \]

    \[ center_y=(h+offset)*step \]

    \[ w \subset \left( 0, W \right ) \]

    \[ h \subset \left( 0, H \right ) \]

  2. For each $s \subset \left( 0, W \right )$ calculates the prior boxes coordinates:

    \[ xmin = \frac{center_x - \frac{width_s}{2}}{W} \]

    \[ ymin = \frac{center_y - \frac{height_s}{2}}{H} \]

    \[ xmax = \frac{center_x - \frac{width_s}{2}}{W} \]

    \[ ymax = \frac{center_y - \frac{height_s}{2}}{H} \]

If clip is defined, the coordinates of prior boxes are recalculated with the formula: $coordinate = \min(\max(coordinate,0), 1)$

Example

<layer ... type="PriorBoxClustered">
<data clip="0" flip="0" height="44.0,10.0,30.0,19.0,94.0,32.0,61.0,53.0,17.0" offset="0.5" step="16.0" variance="0.1,0.1,0.2,0.2"
width="86.0,13.0,57.0,39.0,68.0,34.0,142.0,50.0,23.0"/>
<input>
...
</input>
<output>
...
</output>
</layer>

MVN Layer

Name: MVN

Short description: Reference

Parameters: MVN layer parameters should be specified as the data node, which is a child of the layer node.

Mathematical Formulation

MVN subtracts mean from the input blob:

\[ o_{i} = i_{i} - \frac{\sum{i_{k}}}{C * H * W} \]

If normalize_variance is set to 1, the output blob is divided by variance:

\[ o_{i}=\frac{o_{i}}{\sum \sqrt {o_{k}^2}+\epsilon} \]

Example

<layer ... type="MVN">
<data across_channels="1" eps="9.999999717180685e-10" normalize_variance="1"/>
<input>
...
</input>
<output>
...
</output>
</layer>

CTCGreadyDecoder Layer

Name: CTCGreadyDecoder

Short description: CTCGreadyDecoder performs greedy decoding on the logits given in input (best path).

Detailed description: Reference

Parameters: CTCGreadyDecoder layer parameters should be specified as the data node, which is a child of the layer node.

Mathematical Formulation

Given an input sequence $X$ of length $T$, CTCGreadyDecoder assumes the probability of a length $T$ character sequence $C$ is given by

\[ p(C|X) = \prod_{t=1}^{T} p(c_{t}|X) \]

Example

<layer ... type="CTCGreadyDecoder" ... >
<data stride="1"/>
<input> ... </input>
<output> ... </output>
</layer>

Proposal Layer

Name: Proposal

Short description: Proposal layer performs filtering of only those bounding boxes and outputs with the highest confidence of prediction.

Parameters: Proposal layer parameters should be specified as the data node, which is a child of the layer node.

Mathematical Formulation

Proposal layer accepts three inputs with four dimensions. The produced blob has two dimensions: first one equals batch_size * post_nms_topn.

Proposal does the following with the input blob:

  1. Generates initial anchor boxes Left top corner of all boxes in (0, 0). Width and height of boxes are calculated from base_size with scale and ratio parameters
  2. For each point in the first input blob:
    • pins anchor boxes to the image according to the second input blob that contains four deltas for each box: for x and y of center, for width and for height
    • finds out score in the first input blob
  3. Filters out boxes with size less than min_size
  4. Sorts all proposals (box, score) by score from highest to lowest
  5. Takes top pre_nms_topn proposals
  6. Calculates intersections for boxes and filter out all with $intersection/union > nms\_thresh$
  7. Takes top post_nms_topn proposals
  8. Returns top proposals

Example

<layer ... type="Proposal" ... >
<data base_size="16" feat_stride="16" min_size="16" nms_thresh="0.6" post_nms_topn="200" pre_nms_topn="6000"
ratio="2.67" scale="4.0,6.0,9.0,16.0,24.0,32.0"/>
<input> ... </input>
<output> ... </output>
</layer>

Resample Layer

Name: Resample

Short description: Resample layer scales the input blob by the specified parameters.

Parameters: Resample layer parameters should be specified as the data node, which is a child of the layer node.

Mathematical formulation

Resample layer scales the input blob. Depending on the type parameter, Resample applies different blob interpolation algorithms and performs anti-aliasing if the antialias parameter is specified.

Example

<layer type="Resample">
<data antialias="0" factor="1.0" height="227" type="caffe.ResampleParameter.LINEAR" width="227"/>
<input>
...
</input>
<output>
...
</output>
​</layer>

Power Layer

Name: Power

Short description: Power layer computes the output as (shift + scale * x) ^ power for each input element x.

Parameters: Power layer parameters should be specified as the data node, which is a child of the layer node.

Mathematical Formulation

\[ p = (shift + scale * x)^{power} \]

Example

<layer ... type="Power" ... >
<data power="2" scale="0.1" shift="5"/>
<input> ... </input>
<output> ... </output>
</layer>

Pad Layer

Name: Pad

Short description: Pad layer extends an input tensor on edges. New element values are generated based on the Pad layer parameters described below.

Parameters: Pad layer parameters should be specified in the data section, which is placed as a child of the layer node. The parameters specify a number of elements to added along each axis and a rule by which new element values are generated: for example, whether they are filled with a given constant or generated based on the input tensor content.

Inputs

Outputs

pad_mode Examples

The following examples illustrate how output tensor is generated for the Pad layer for a given input tensor:

INPUT =
[[ 1 2 3 4 ]
[ 5 6 7 8 ]
[ 9 10 11 12 ]]

with the following parameters:

pads_begin = [0, 1]
pads_end = [2, 3]

depending on the pad_mode.

Example

<layer id="1" name="MyPad" precision="FP32" type="Pad">
<data pads_begin="0,5,2,1" pads_end="1,0,3,7" pad_mode="constant" pad_value="666.0"/>
<input>
<port id="0">
<dim>1</dim>
<dim>3</dim>
<dim>32</dim>
<dim>40</dim>
</port>
</input>
<output>
<port id="2">
<dim>2</dim>
<dim>8</dim>
<dim>37</dim>
<dim>48</dim>
</port>
</output>
</layer>

LSTMCell Layer

Name: LSTMCell

Short description: LSTMCell layer computes the output using the formula described in original paper Long Short-Term Memory.

Parameters: None

Mathematical Formulation

inputs:
X - input data
Hi - input hidden state
Ci - input cell state
outputs:
Ho - output hidden state
Co - output cell state
Formula:
* - matrix mult
(.) - eltwise mult
[,] - concatenation
sigm - 1/(1 + e^{-x})
tanh - (e^{2x} - 1)/(e^{2x} + 1)
f = sigm(Wf*[Hi, X] + Bf)
i = sigm(Wi*[Hi, X] + Bi)
c = tanh(Wc*[Hi, X] + Bc)
o = sigm(Wo*[Hi, X] + Bo)
Co = f (.) Ci + i (.) c
Ho = o (.) tanh(Co)

Example

<layer ... type="LSTMCell" ... >
<input> ... </input>
<output> ... </output>
</layer>

TensorIterator Layer

Name: TensorIterator

Short description: TensorIterator (TI) layer performs recurrent subgraph execution iterating through the data.

Parameters: port_map and back_edges sections specifying data mapping rules:

Example

<layer ... type="Power" ... >
<input> ... </input>
<output> ... </output>
<port_map>
<input external_port_id="0" internal_layer_id="0" internal_port_id="0" axis="1" start="-1" end="0" stride="-1"/>
<input external_port_id="1" internal_layer_id="1" internal_port_id="1"/>
...
<output external_port_id="3" internal_layer_id="2" internal_port_id="1" axis="1" start="-1" end="0" stride="-1"/>
...
</port_map>
<back_edges>
<edge from-layer="1" from-port="1" to-layer="1" to-port="1"/>
...
</back_edges>
<body>
<layers> ... </layers>
<edges> ... </edges>
</body>
</layer>