(Deprecated) Case Study: Converting SSD Models Created with TensorFlow* Object Detection API

This is a deprecated page. Please, consider reading this page describing new approach to convert Object Detection API models giving closer to TensorFlow inference results.

Converting Models Created with TensorFlow Object Detection API Version prior 1.6.0

As explained in the Sub-graph Replacement in Model Optimizer section, there are multiple ways to setup the sub-graph matching. In this example we are focusing on the defining the sub-graph via a set of "start" and "end" nodes. The result of matching is two buckets of nodes:

Let's look closer to the SSD models from the TensorFlow* detection model zoo: SSD MobileNet and SSD InceptionV2.

A distinct layer of any SSD topology is the DetectionOutput layer. This layer is implemented with a dozens of primitive operations in TensorFlow, while in Inference Engine, it is one layer. Thus, to convert a SSD model from the TensorFlow, the Model Optimizer should replace the entire sub-graph of operations that implement the DetectionOutput layer with a single well-known DetectionOutput node.

The Inference Engine DetectionOutput layer consumes three tensors in the following order:

  1. Tensor with locations of bounding boxes
  2. Tensor with confidences for each bounding box
  3. Tensor with prior boxes (anchors in TensorFlow terminology)

DetectionOutput layer produces one tensor with seven numbers for each actual detection. There are more output tensors in the TensorFlow Object Detection API, but the values in them are consistent with the Inference Engine ones.

The difference with other examples is that here the DetectionOutput sub-graph is replaced with a new sub-graph (not a single layer).

Look at sub-graph replacement configuration file <INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/tf/legacy_ssd_support.json that is used to enable two models listed above:

[
{
"custom_attributes": {
"code_type": "caffe.PriorBoxParameter.CENTER_SIZE",
"confidence_threshold": 0.01,
"keep_top_k": 200,
"nms_threshold": 0.45,
"pad_mode": "caffe.ResizeParameter.CONSTANT",
"resize_mode": "caffe.ResizeParameter.WARP"
},
"id": "TFObjectDetectionAPIDetectionOutput",
"include_inputs_to_sub_graph": true,
"include_outputs_to_sub_graph": true,
"instances": {
"end_points": [
"detection_boxes",
"detection_scores",
"num_detections"
],
"start_points": [
"Postprocessor/Shape",
"Postprocessor/Slice",
"Postprocessor/ExpandDims",
"Postprocessor/Reshape_1"
]
},
"match_kind": "points"
},
{
"custom_attributes": {
},
"id": "PreprocessorReplacement",
"inputs": [
[
{
"node": "map/Shape$",
"port": 0
},
{
"node": "map/TensorArrayUnstack/Shape$",
"port": 0
},
{
"node": "map/TensorArrayUnstack/TensorArrayScatter/TensorArrayScatterV3$",
"port": 2
}
]
],
"instances": [
".*Preprocessor/"
],
"match_kind": "scope",
"outputs": [
{
"node": "sub$",
"port": 0
},
{
"node": "map/TensorArrayStack_1/TensorArrayGatherV3$",
"port": 0
}
]
}
]

Key lines:

The second sub-graph replacer with identifier PreprocessorReplacement is used to remove the Preprocessing block from the graph. The replacer removes all nodes from this scope except nodes performing mean value subtraction and scaling (if applicable). Implementation of the replacer is in the <INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/tf/Preprocessor.py file.

Now let's analyze the structure of the topologies generated with the Object Detection API. There are several blocks in the graph performing particular task:

Object Detection API Postprocessor block generates output nodes: detection_boxes, detection_scores, num_detections, detection_classes.

Now consider the implementation of the sub-graph replacer, available in the <INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/tf/SSDs.py. The file is rather big, so only some code snippets are used:

class PostprocessorReplacement(FrontReplacementFromConfigFileSubGraph):
replacement_id = 'TFObjectDetectionAPIDetectionOutput'

These lines define the new PostprocessorReplacement class inherited from FrontReplacementFromConfigFileSubGraph. FrontReplacementFromConfigFileSubGraph is designed to replace sub-graph of operations described in the configuration file. There are methods to override for implementing custom replacement logic that we need:

Review of the replacer code, considering details of the DetectionOutput layer implementation in the Inference Engine. There are several constraints to the input tensors of the DetectionOutput layer:

To enable these models, add Reshape operations for locations and confidences tensors and update the values for the prior boxes to include the variance constants (they are not there in TensorFlow Object Detection API).

Look at the generate_sub_graph method:

def generate_sub_graph(self, graph: nx.MultiDiGraph, match: SubgraphMatch):
log.debug('PostprocessorReplacement.generate_sub_graph')
log.debug('matched_nodes = {}'.format(match.matched_nodes_names()))
# softmax to be applied to the confidence
softmax_conf_op = Softmax(graph, {'axis': 2, 'nchw_layout': True})
softmax_conf_node = softmax_conf_op.add_node(dict(name='DetectionOutput_SoftMax_conf_'))
# Inference Engine DetectionOutput layer consumes flattened tensors
# reshape operation to flatten locations tensor
reshape_loc_op = Reshape(graph, {'dim': np.array([0, -1])})
reshape_loc_node = reshape_loc_op.add_node(dict(name='DetectionOutput_Reshape_loc_'))
# Inference Engine DetectionOutput layer consumes flattened tensors
# reshape operation to flatten confidence tensor
reshape_conf_op = Reshape(graph, {'dim': np.array([0, -1])})
reshape_conf_node = reshape_conf_op.add_node(dict(name='DetectionOutput_Reshape_conf_'))
# create Node object from Op class
detection_output_op = DetectionOutput(graph, match.custom_replacement_desc.custom_attributes)
detection_output_op.attrs['old_infer'] = detection_output_op.attrs['infer']
detection_output_op.attrs['infer'] = __class__.do_infer
detection_output_node = detection_output_op.add_node(dict(name=detection_output_op.attrs['type'] + '_'))
# create internal edges of the sub-graph. In this case we add edges to connect input port 0 and 1 of the
# detection output with output of reshape of locations and reshape of confidence
create_edge(softmax_conf_node, reshape_conf_node, 0, 0)
create_edge(reshape_loc_node, detection_output_node, 0, 0)
create_edge(reshape_conf_node, detection_output_node, 0, 1)
return {'detection_output_node': detection_output_node, 'reshape_conf_node': softmax_conf_node,
'reshape_loc_node': reshape_loc_node}

The method has two inputs: the graph to operate on and the instance of SubgraphMatch object, which describes matched sub-graph. The latter class has several useful methods to get particular input/output node of the sub-graph by input/output index or by node name pattern. Examples of these methods usage are given below.

Key lines:

The input_edges_match method is the following:

def input_edges_match(self, graph: nx.DiGraph, match: SubgraphMatch, new_sub_graph: dict):
locs_consumer_node, locs_consumer_node_port = match.input_nodes(0)[0]
conf_consumer_node, conf_consumer_node_port = match.input_nodes(1)[0]
priors_consumer_node, priors_consumer_node_port = match.input_nodes(2)[0]
# create matching nodes for locations and confidence tensors using simple scheme "old_node_name: new_node_name"
# which in fact means "(old_node_name, 0): (new_node_name, 0)", while first '0' means old_port and the second
# zero defines 'new_port'.
return {locs_consumer_node.id: new_sub_graph['reshape_loc_node'].id,
conf_consumer_node.id: new_sub_graph['reshape_conf_node'].id,
priors_consumer_node.id: (new_sub_graph['detection_output_node'].id, 2),
}

The method has three parameters: input graph, match object describing matched sub-graph and new_sub_graph dictionary with alias names returned from the generate_sub_graph method.

Key lines:

The output_edges_match method is the following:

def output_edges_match(self, graph: nx.DiGraph, match: SubgraphMatch, new_sub_graph: dict):
# the DetectionOutput in IE produces single tensor, but in TF it produces two tensors, so we need to create only
# one output edge match
return {match.output_node(0)[0].id: new_sub_graph['detection_output_node'].id}

The method has the same three parameters as input_edges_match method. The returned dictionary contains mapping just for one tensor initially produces by the first output node of the sub-graph (which is detection_boxes according to the configuration file) to a single output tensor of the created DetectionOutput node. In fact, it is possible to use any output node of the initial sub-graph in mapping, because the sub-graph output nodes are the output nodes of the whole graph (their output is not consumed by any other nodes).

Now, the Model Optimizer knows how to replace the sub-graph. The last step to enable the model is to cut-off some parts of the graph not needed during inference.

It is necessary to remove the Preprocessor block where image is resized. Inference Engine does not support dynamic input shapes, so the Model Optimizer must froze the input image size, and thus, resizing of the image is not necessary. This is achieved by replacer <INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/tf/Preprocessor.py which is executed automatically.

There are several Switch operations in the Postprocessor block without output edges. For example:

Postprocessor/BatchMultiClassNonMaxSuppression/map/while/PadOrClipBoxList/cond/cond/switch_t
Postprocessor/BatchMultiClassNonMaxSuppression/map/while/PadOrClipBoxList/cond/cond/switch_f
Postprocessor/BatchMultiClassNonMaxSuppression/map/while/PadOrClipBoxList/cond_1/cond/switch_t
Postprocessor/BatchMultiClassNonMaxSuppression/map/while/PadOrClipBoxList/cond_1/cond/switch_f

Model Optimizer marks these nodes as output nodes of the topology. Some parts of the Posprocessor blocks are not removed during sub-graph replacement because of that. In order to fix this issue, it is necessary to specify output nodes of the graph manually using the --output command line parameter.

Example Model Optimizer Command-Line for TensorFlow* SSD

The final command line to convert SSDs from the TensorFlow Object Detection API Zoo is:

./mo_tf.py --input_model=<path_to_frozen.pb> --tensorflow_use_custom_operations_config extensions/front/tf/legacy_ssd_support.json --output="detection_boxes,detection_scores,num_detections"

Converting MobileNet V2 model created with TensorFlow Object Detection API

The MobileNet V2 model differs from the previous version, so converting the model requires a new sub-graph replacement configuration file and new command line parameters. The major differences are:

The updated sub-graph replacement configuration file extensions/front/tf/ssd_v2_support.json reflecting these changes is the following:

[
{
"custom_attributes": {
"code_type": "caffe.PriorBoxParameter.CENTER_SIZE",
"confidence_threshold": 0.01,
"keep_top_k": 200,
"nms_threshold": 0.6,
"pad_mode": "caffe.ResizeParameter.CONSTANT",
"resize_mode": "caffe.ResizeParameter.WARP"
},
"id": "TFObjectDetectionAPIDetectionOutput",
"include_inputs_to_sub_graph": true,
"include_outputs_to_sub_graph": true,
"instances": {
"end_points": [
"detection_boxes",
"detection_scores",
"num_detections"
],
"start_points": [
"Postprocessor/Shape",
"Postprocessor/scale_logits",
"Postprocessor/ExpandDims",
"Postprocessor/Reshape_1",
"Postprocessor/ToFloat"
]
},
"match_kind": "points"
},
{
"custom_attributes": {
},
"id": "PreprocessorReplacement",
"inputs": [
[
{
"node": "map/Shape$",
"port": 0
},
{
"node": "map/TensorArrayUnstack/Shape$",
"port": 0
},
{
"node": "map/TensorArrayUnstack/TensorArrayScatter/TensorArrayScatterV3$",
"port": 2
}
]
],
"instances": [
".*Preprocessor/"
],
"match_kind": "scope",
"outputs": [
{
"node": "sub$",
"port": 0
},
{
"node": "map/TensorArrayStack_1/TensorArrayGatherV3$",
"port": 0
}
]
}
]

Example of Model Optimizer Command-Line for TensorFlow SSD MobileNet V2

The final command line to convert MobileNet SSD V2 from the TensorFlow Object Detection Zoo is the following:

./mo_tf.py --input_model=<path_to_frozen.pb> --tensorflow_use_custom_operations_config extensions/front/tf/ssd_v2_support.json --output="detection_boxes,detection_scores,num_detections"