MulticlassNonMaxSuppression¶
Versioned name: MulticlassNonMaxSuppression-9
Category: Sorting and maximization
Short description: MulticlassNonMaxSuppression performs multi-class non-maximum suppression of the boxes with predicted scores.
Detailed description: MulticlassNonMaxSuppression is a multi-phase operation. It implements non-maximum suppression algorithm as described below:
Let
B = [b_0,...,b_n]
be the list of initial detection boxes,S = [s_0,...,s_N]
be the list of corresponding scores.Let
D = []
be an initial collection of resulting boxes. Letadaptive_threshold = iou_threshold
.If
B
is empty, go to step 9.Take the box with highest score. Suppose that it is the box
b
with the scores
.Delete
b
fromB
.If the score
s
is greater than or equal toscore_threshold
, addb
toD
, else go to step 9.If
nms_eta < 1
andadaptive_threshold > 0.5
, updateadaptive_threshold *= nms_eta
.For each input box
b_i
fromB
and the corresponding scores_i
, sets_i = 0
wheniou(b, b_i) > adaptive_threshold
, and go to step 3.Return
D
, a collection of the corresponding scoresS
, and the number of elements inD
.
This algorithm is applied independently to each class of each batch element. The operation feeds at most nms_top_k
scoring candidate boxes to this algorithm.
The total number of output boxes of each batch element must not exceed keep_top_k
.
Boxes of background_class
are skipped and thus eliminated.
Attributes:
sort_result
Description: sort_result specifies the order of output elements.
Range of values:
class
,score
,none
class - sort selected boxes by class id (ascending).
score - sort selected boxes by score (descending).
none - do not guarantee the order.
Type:
string
Default value:
none
Required: no
sort_result_across_batch
Description: sort_result_across_batch is a flag that specifies whenever it is necessary to sort selected boxes across batches or not.
Range of values: true or false
true - sort selected boxes across batches.
false - do not sort selected boxes across batches (boxes are sorted per batch element).
Type: boolean
Default value: false
Required: no
output_type
Description: the tensor type of outputs
selected_indices
andvalid_outputs
.Range of values:
i64
ori32
Type:
string
Default value:
i64
Required: no
iou_threshold
Description: intersection over union threshold.
Range of values: a floating-point number
Type:
float
Default value:
0
Required: no
score_threshold
Description: minimum score to consider box for the processing.
Range of values: a floating-point number
Type:
float
Default value:
0
Required: no
nms_top_k
Description: maximum number of boxes to be selected per class.
Range of values: an integer
Type:
int
Default value:
-1
meaning to keep all boxesRequired: no
keep_top_k
Description: maximum number of boxes to be selected per batch element.
Range of values: an integer
Type:
int
Default value:
-1
meaning to keep all boxesRequired: no
background_class
Description: the background class id.
Range of values: an integer
Type:
int
Default value:
-1
meaning to keep all classes.Required: no
normalized
Description: normalized is a flag that indicates whether
boxes
are normalized or not.Range of values: true or false
true - the box coordinates are normalized.
false - the box coordinates are not normalized.
Type: boolean
Default value: True
Required: no
nms_eta
Description: eta parameter for adaptive NMS.
Range of values: a floating-point number in close range
[0, 1.0]
.Type:
float
Default value:
1.0
Required: no
Inputs:
There are 2 kinds of input formats. The first one is of two inputs. The boxes are shared by all classes.
1:
boxes
- tensor of type T and shape[num_batches, num_boxes, 4]
with box coordinates. The box coordinates are layout as[xmin, ymin, xmax, ymax]
. Required.2:
scores
- tensor of type T and shape[num_batches, num_classes, num_boxes]
with box scores. The tensor type should be same withboxes
. Required.
The second format is of three inputs. Each class has its own boxes that are not shared.
* 1: boxes
- tensor of type T and shape [num_classes, num_boxes, 4]
with box coordinates. The box coordinates are layout as [xmin, ymin, xmax, ymax]
. Required.
2:
scores
- tensor of type T and shape[num_classes, num_boxes]
with box scores. The tensor type should be same withboxes
. Required.3:
roisnum
- tensor of type T_IND and shape[num_batches]
with box numbers in each image.num_batches
is the number of images. Each element in this tensor is the number of boxes for corresponding image. The sum of all elements isnum_boxes
. Required.
Outputs:
1:
selected_outputs
- tensor of type T which should be same withboxes
and shape[number of selected boxes, 6]
containing the selected boxes with score and class as tuples[class_id, box_score, xmin, ymin, xmax, ymax]
.2:
selected_indices
- tensor of type T_IND and shape[number of selected boxes, 1]
the selected indices in the flattenedboxes
, which are absolute values cross batches. Therefore possible valid values are in the range[0, num_batches * num_boxes - 1]
.3:
selected_num
- 1D tensor of type T_IND and shape[num_batches]
representing the number of selected boxes for each batch element.
When there is no box selected, selected_num
is filled with 0
. selected_outputs
is an empty tensor of shape [0, 6]
, and selected_indices
is an empty tensor of shape [0, 1]
.
Types
T: floating-point type.
T_IND:
int64
orint32
.
Example
<layer ... type="MulticlassNonMaxSuppression" ... >
<data sort_result="score" output_type="i64" sort_result_across_batch="false" iou_threshold="0.2" score_threshold="0.5" nms_top_k="-1" keep_top_k="-1" background_class="-1" normalized="false" nms_eta="0.0"/>
<input>
<port id="0">
<dim>3</dim>
<dim>100</dim>
<dim>4</dim>
</port>
<port id="1">
<dim>3</dim>
<dim>5</dim>
<dim>100</dim>
</port>
</input>
<output>
<port id="5" precision="FP32">
<dim>-1</dim> <!-- "-1" means a undefined dimension calculated during the model inference -->
<dim>6</dim>
</port>
<port id="6" precision="I64">
<dim>-1</dim>
<dim>1</dim>
</port>
<port id="7" precision="I64">
<dim>3</dim>
</port>
</output>
</layer>
Another possible example with 3 inputs could be like:
<layer ... type="MulticlassNonMaxSuppression" ... >
<data sort_result="score" output_type="i64" sort_result_across_batch="false" iou_threshold="0.2" score_threshold="0.5" nms_top_k="-1" keep_top_k="-1" background_class="-1" normalized="false" nms_eta="0.0"/>
<input>
<port id="0">
<dim>3</dim>
<dim>100</dim>
<dim>4</dim>
</port>
<port id="1">
<dim>3</dim>
<dim>100</dim>
</port>
<port id="2">
<dim>10</dim>
</port>
</input>
<output>
<port id="5" precision="FP32">
<dim>-1</dim> <!-- "-1" means a undefined dimension calculated during the model inference -->
<dim>6</dim>
</port>
<port id="6" precision="I64">
<dim>-1</dim>
<dim>1</dim>
</port>
<port id="7" precision="I64">
<dim>3</dim>
</port>
</output>
</layer>