RegionYolo¶
Versioned name: RegionYolo-1
Category: Object detection
Short description: RegionYolo computes the coordinates of regions with probability for each class.
Detailed description: This operation is directly mapped to the YOLO9000: Better, Faster, Stronger paper.
Attributes:
anchors
Description: anchors codes a flattened list of pairs
[width, height]
that codes prior box sizes. This attribute is not used in output computation, but it is required for post-processing to restore real box coordinates.Range of values: list of any length of positive floating-point number
Type:
float[]
Default value: None
Required: no
axis
Description: starting axis index in the input tensor
data
shape that will be flattened in the output; the end of flattened range is defined byend_axis
attribute.Range of values:
-rank(data) .. rank(data)-1
Type:
int
Required: yes
coords
Description: coords is the number of coordinates for each region.
Range of values: an integer
Type:
int
Required: yes
classes
Description: classes is the number of classes for each region.
Range of values: an integer
Type:
int
Required: yes
end_axis
Description: ending axis index in the input tensor
data
shape that will be flattened in the output; the beginning of the flattened range is defined byaxis
attribute.Range of values:
-rank(data)..rank(data)-1
Type:
int
Required: yes
num
Description: num is the number of regions.
Range of values: an integer
Type:
int
Required: yes
do_softmax
Description: do_softmax is a flag that specifies the inference method and affects how the number of regions is determined. It also affects output shape. If it is 0, then output shape is 4D, and 2D otherwise.
Range of values:
false - do not perform softmax
true - perform softmax
Type:
boolean
Default value: true
Required: no
mask
Description: mask specifies the number of regions. Use this attribute instead of num when do_softmax is equal to 0.
Range of values: a list of integers
Type:
int[]
Default value:
[]
Required: no
Inputs:
1:
data
- 4D tensor of type T and shape[N, C, H, W]
. Required.
Outputs:
1: tensor of type T and rank 4 or less that codes detected regions. Refer to the YOLO9000: Better, Faster, Stronger paper to decode the output as boxes.
anchors
should be used to decode real box coordinates. Ifdo_softmax
is set to0
, then the output shape is[N, (classes + coords + 1) * len(mask), H, W]
. Ifdo_softmax
is set to1
, then output shape is partially flattened and defined in the following way:
flat_dim = data.shape[axis] * data.shape[axis+1] * ... * data.shape[end_axis]
output.shape = [data.shape[0], ..., data.shape[axis-1], flat_dim, data.shape[end_axis + 1], ...]
Types
T: any supported floating-point type.
Example
<!-- YOLO V3 example -->
<layer type="RegionYolo" ... >
<data anchors="10,14,23,27,37,58,81,82,135,169,344,319" axis="1" classes="80" coords="4" do_softmax="0" end_axis="3" mask="0,1,2" num="6"/>
<input>
<port id="0">
<dim>1</dim>
<dim>255</dim>
<dim>26</dim>
<dim>26</dim>
</port>
</input>
<output>
<port id="0">
<dim>1</dim>
<dim>255</dim>
<dim>26</dim>
<dim>26</dim>
</port>
</output>
</layer>
<!-- YOLO V2 Example -->
<layer type="RegionYolo" ... >
<data anchors="1.08,1.19,3.42,4.41,6.63,11.38,9.42,5.11,16.62,10.52" axis="1" classes="20" coords="4" do_softmax="1" end_axis="3" num="5"/>
<input>
<port id="0">
<dim>1</dim>
<dim>125</dim>
<dim>13</dim>
<dim>13</dim>
</port>
</input>
<output>
<port id="0">
<dim>1</dim>
<dim>21125</dim>
</port>
</output>
</layer>