MulticlassNonMaxSuppression

Versioned name: MulticlassNonMaxSuppression-9

Category: Sorting and maximization

Short description: MulticlassNonMaxSuppression performs multi-class non-maximum suppression of the boxes with predicted scores.

Detailed description: MulticlassNonMaxSuppression is a multi-phase operation. It implements non-maximum suppression algorithm as described below:

  1. Let B = [b_0,...,b_n] be the list of initial detection boxes, S = [s_0,...,s_N] be the list of corresponding scores.

  2. Let D = [] be an initial collection of resulting boxes. Let adaptive_threshold = iou_threshold.

  3. If B is empty, go to step 9.

  4. Take the box with highest score. Suppose that it is the box b with the score s.

  5. Delete b from B.

  6. If the score s is greater than or equal to score_threshold, add b to D, else go to step 9.

  7. If nms_eta < 1 and adaptive_threshold > 0.5, update adaptive_threshold *= nms_eta.

  8. For each input box b_i from B and the corresponding score s_i, set s_i = 0 when iou(b, b_i) > adaptive_threshold, and go to step 3.

  9. Return D, a collection of the corresponding scores S, and the number of elements in D.

This algorithm is applied independently to each class of each batch element. The operation feeds at most nms_top_k scoring candidate boxes to this algorithm. The total number of output boxes of each batch element must not exceed keep_top_k. Boxes of background_class are skipped and thus eliminated.

Attributes:

  • sort_result

    • Description: sort_result specifies the order of output elements.

    • Range of values: class, score, none

      • class - sort selected boxes by class id (ascending).

      • score - sort selected boxes by score (descending).

      • none - do not guarantee the order.

    • Type: string

    • Default value: none

    • Required: no

  • sort_result_across_batch

    • Description: sort_result_across_batch is a flag that specifies whenever it is necessary to sort selected boxes across batches or not.

    • Range of values: true or false

      • true - sort selected boxes across batches.

      • false - do not sort selected boxes across batches (boxes are sorted per batch element).

    • Type: boolean

    • Default value: false

    • Required: no

  • output_type

    • Description: the tensor type of outputs selected_indices and valid_outputs.

    • Range of values: i64 or i32

    • Type: string

    • Default value: i64

    • Required: no

  • iou_threshold

    • Description: intersection over union threshold.

    • Range of values: a floating-point number

    • Type: float

    • Default value: 0

    • Required: no

  • score_threshold

    • Description: minimum score to consider box for the processing.

    • Range of values: a floating-point number

    • Type: float

    • Default value: 0

    • Required: no

  • nms_top_k

    • Description: maximum number of boxes to be selected per class.

    • Range of values: an integer

    • Type: int

    • Default value: -1 meaning to keep all boxes

    • Required: no

  • keep_top_k

    • Description: maximum number of boxes to be selected per batch element.

    • Range of values: an integer

    • Type: int

    • Default value: -1 meaning to keep all boxes

    • Required: no

  • background_class

    • Description: the background class id.

    • Range of values: an integer

    • Type: int

    • Default value: -1 meaning to keep all classes.

    • Required: no

  • normalized

    • Description: normalized is a flag that indicates whether boxes are normalized or not.

    • Range of values: true or false

      • true - the box coordinates are normalized.

      • false - the box coordinates are not normalized.

    • Type: boolean

    • Default value: True

    • Required: no

  • nms_eta

    • Description: eta parameter for adaptive NMS.

    • Range of values: a floating-point number in close range [0, 1.0].

    • Type: float

    • Default value: 1.0

    • Required: no

Inputs:

There are 2 kinds of input formats. The first one is of two inputs. The boxes are shared by all classes.

  • 1: boxes - tensor of type T and shape [num_batches, num_boxes, 4] with box coordinates. The box coordinates are layout as [xmin, ymin, xmax, ymax]. Required.

  • 2: scores - tensor of type T and shape [num_batches, num_classes, num_boxes] with box scores. The tensor type should be same with boxes. Required.

The second format is of three inputs. Each class has its own boxes that are not shared. * 1: boxes - tensor of type T and shape [num_classes, num_boxes, 4] with box coordinates. The box coordinates are layout as [xmin, ymin, xmax, ymax]. Required.

  • 2: scores - tensor of type T and shape [num_classes, num_boxes] with box scores. The tensor type should be same with boxes. Required.

  • 3: roisnum - tensor of type T_IND and shape [num_batches] with box numbers in each image. num_batches is the number of images. Each element in this tensor is the number of boxes for corresponding image. The sum of all elements is num_boxes. Required.

Outputs:

  • 1: selected_outputs - tensor of type T which should be same with boxes and shape [number of selected boxes, 6] containing the selected boxes with score and class as tuples [class_id, box_score, xmin, ymin, xmax, ymax].

  • 2: selected_indices - tensor of type T_IND and shape [number of selected boxes, 1] the selected indices in the flattened boxes, which are absolute values cross batches. Therefore possible valid values are in the range [0, num_batches * num_boxes - 1].

  • 3: selected_num - 1D tensor of type T_IND and shape [num_batches] representing the number of selected boxes for each batch element.

When there is no box selected, selected_num is filled with 0. selected_outputs is an empty tensor of shape [0, 6], and selected_indices is an empty tensor of shape [0, 1].

Types

  • T: floating-point type.

  • T_IND: int64 or int32.

Example

<layer ... type="MulticlassNonMaxSuppression" ... >
    <data sort_result="score" output_type="i64" sort_result_across_batch="false" iou_threshold="0.2" score_threshold="0.5" nms_top_k="-1" keep_top_k="-1" background_class="-1"    normalized="false" nms_eta="0.0"/>
    <input>
        <port id="0">
            <dim>3</dim>
            <dim>100</dim>
            <dim>4</dim>
        </port>
        <port id="1">
            <dim>3</dim>
            <dim>5</dim>
            <dim>100</dim>
        </port>
    </input>
    <output>
        <port id="5" precision="FP32">
            <dim>-1</dim> <!-- "-1" means a undefined dimension calculated during the model inference -->
            <dim>6</dim>
        </port>
        <port id="6" precision="I64">
            <dim>-1</dim>
            <dim>1</dim>
        </port>
        <port id="7" precision="I64">
            <dim>3</dim>
        </port>
    </output>
</layer>

Another possible example with 3 inputs could be like:

<layer ... type="MulticlassNonMaxSuppression" ... >
    <data sort_result="score" output_type="i64" sort_result_across_batch="false" iou_threshold="0.2" score_threshold="0.5" nms_top_k="-1" keep_top_k="-1" background_class="-1"    normalized="false" nms_eta="0.0"/>
    <input>
        <port id="0">
            <dim>3</dim>
            <dim>100</dim>
            <dim>4</dim>
        </port>
        <port id="1">
            <dim>3</dim>
            <dim>100</dim>
        </port>
        <port id="2">
            <dim>10</dim>
        </port>
    </input>
    <output>
        <port id="5" precision="FP32">
            <dim>-1</dim> <!-- "-1" means a undefined dimension calculated during the model inference -->
            <dim>6</dim>
        </port>
        <port id="6" precision="I64">
            <dim>-1</dim>
            <dim>1</dim>
        </port>
        <port id="7" precision="I64">
            <dim>3</dim>
        </port>
    </output>
</layer>