Sub-Graph Replacement in the Model Optimizer

Several reasons exist for why the Model Optimizer could not generate an Intermediate Representation for a model. However, in some cases, the Intermediate Representation could be generated after providing certain hints to the tool. The examples of hints below are mostly related to TensorFlow*, but potentially could be actual for models created in any framework:

• Topology contains an operation (or a sub-graph of operations) not known for Model Optimizer, but this operation (sub-graph) could be expressed as a combination of known operations. A hint would be a description of this combination to the tool).
• Sub-graph of operations in the topology expresses a single layer known to Inference Engine.
• TensorFlow and Inference Engine use different layouts of tensors, NHWC and NCHW respectively. If some tensor in NHWC layout is flattened (for example, all the dimensions are squashed into single dim), it is not possible to convert it to NCHW layout required for Inference Engine, so Model Optimizer cannot produce correct Intermediate Representation.

The detailed solutions for the examples above are given later, the next subsection shows what is common in all three examples.

## Sub-graph Replacement

In these cases, the sub-graph (or a single node) of initial graph is replaced with a new sub-graph (single node). The sub-graph replacement consists of the following steps:

1. Identify an existing sub-graph for replacement
2. Generate a new sub-graph
3. Connect a new sub-graph to the graph (create input/output edges to the new sub-graph)
4. Create output edges out of a new sub-graph to the graph
5. Do something with the original sub-graph (for example, remove it)

Model Optimizer provides several ways to perform most of the sub-graph replacement steps. The next subsections describe these methods.

## Replace a Single Operation with a Sub-graph of Operations

For example, there is an operation SquaredDifference in TensorFlow which calculates $$(a - b)^2$$, where $$a$$ and $$b$$ are input tensors. Inference Engine does not support such operation. However, SquaredDifference could be expressed using two Power operations and one Eltwise Add. The Power operation calculates $$scale * (a ^ {power}) + shift$$, where $$a$$ is a tensor and $$scale$$, $$power$$ and $$shift$$ are float values. The first Power operation negates the value of tensor $$b$$. The second one is used to square the result of $$a + (- b)$$ which is calculated using the Eltwise Add operation applied to tensor $$a$$ and tensor $$-b$$.

Given that, we can replace all SquaredDifference operations in the initial model with two Power and one Eltwise operations. The replacer is implemented in the following file <INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/SquaredDifference.py.

import networkx as nx
from mo.front.common.replacement import FrontReplacementOp
from mo.graph.graph import Node
from mo.ops.eltwise import Eltwise
from mo.ops.power import Power
class SquaredDifference(FrontReplacementOp):
"""
Example class illustrating how to implement replacement of a single op in the front-end of the MO pipeline.
This class replaces a single op SquaredDifference by a sub-graph consisting of 3 lower-level ops.
"""
op = "SquaredDifference"
enabled = True
def replace_op(self, graph: nx.MultiDiGraph, node: Node):
negate = Power(graph, dict(scale=-1, name=node.name + '/negate_'))
squared = Power(graph, dict(power=2, name=node.name + '/squared_'))
# Replace edge from out port 0 of the matched node with a edge from node out_node.id with port 0.
# The "explicit" version of the return value is: [(out_node.id, 0)])
return [out_node.id]

Model Optimizer internal representation of the graph uses the networkx module.

Key lines:

• Line 1: Imports this module.
• Line 3: Imports class FrontReplacementOp that is used to replace operation of particular type with a new sub-graph. This class performs the first step of the sub-graph replacement (identifies an existing sub-graph for replacement). It is important to mention that the replacement happens before shape inference and creation of data nodes representing tensors with values. At this stage of model conversion pipeline, all nodes in the graph are operation nodes or nodes of type Const that produce tensor with fixed value embedded into the node.
• Line 4: Imports class Node representing a single node in the computation graph.
• Lines 5 - 6: Import classes representing operations Power and Eltwise. These classes are inherited from base class mo.ops.Op that represents operation and stores its attributes.
• Line 9: Defines class SquaredDifference inherited from FrontReplacementOp. This is a replacer class that is automatically registered and executed by Model Optimizer. Since the class is located in the common (not framework) specific directory <INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front, it is used for replacement for all supported frameworks.
• Line 15: Defines the class variable op that stores the name of the operation to be replaced. In this case, it is SquaredDifference.
• Line 16: Defines class variable enabled that controls whether the replacer is enabled or not. The only function that should be implemented in the class is replace_op. It gets graph to operate on and an instance of node of desired operation (SquaredDifference in this case). This function performs step two and three of the sub-graph replacement (generates a new sub-graph to replace with and connects a new sub-graph to the graph).
• Lines 19 - 21: Create instances of operations classes with required attributes.
• Line 23: Creates a sub-graph from the operations defined above. The create_node method of the Op class generates Node from the Op and uses single mandatory argument - the list of input nodes (represented as instances of Node class) to create input edges to the node being generated. Inputs of the SquaredDifference node are retrieved using node.in_node(0) and node.in_node(1) method calls. The Eltwise Add node gets first input as initial first input of SquaredDifference node, the second input of add is the result of negation of the second input of SquaredDifference node: [add.create_node([node.in_node(0), negate.create_node([node.in_node(1)])])]. Then the result of Add node is squared. out_node node performs this calculation.

The replace_op function returns a list of node names used to create output edges of the sub-graph to connect it with the rest of the graph. Each element of the list describes mapping between old output edge of the matched node and new sub-graph node and output edge index. The i-th element of the list corresponds to the i-th output tensor of the matched node. In this case, SquaredDifference produces single tensor through output port 0, so the returned list contains single element. In general, each element is a tuple, where the first element is the name of a new node producing required tensor and the second is the output port for that tensor. If the output port is 0, it is possible to use shortcut - just the name of the node instead of a tuple. Line 26 uses this shortcut. The returned value is used to create the new sub-graph output edges (step 4 of the sub-graph replacement).

Default implementation of the FrontReplacementOp class removes matched node and all its input/output edges (step 5 of the sub-graph replacement).

Another example of such kind of replacement is in the <INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/Sub.py class where all instances of Sub operations are replaced with two operations: Power to negate the second argument and the Eltwise to perform elementwise add.

## Replace Sub-graph of Operations with a New Sub-graph of Operations

The previous example considered situation when one single node of a specific type is replaced. When it is necessary to replace a sub-graph of operations it is necessary to tell Model Optimizer how to identify this sub-graph. There are three ways to achieve that:

• Use graph isomorphism pattern of the networkx module
• Use nodes name pattern to identify scope (according to TensorFlow terminology) to be replaced
• Use sets of start and end node names to match all nodes "between" them

The next sections explain each option using real examples.

### Replace Sub-graph of Operations Using Graph Isomorphism Pattern

networkx Python* module provides methods to find graph isomorphic to the given one using nodes and edges match: for example, networkx.algorithms.isomorphism.categorical_node_match, networkx.algorithms.isomorphism.categorical_multiedge_match. Model Optimizer uses these methods and provides simple API to use that feature.

For example, the Caffe* has layer called Mean-Variance Normalization (MVN), which is also supported by the Inference Engine. This layer is implemented with low-level operations in TensorFlow: Mean, StopGradient, SquaredDifference, Squeeze and FusedBatchNorm. Model Optimizer should replace sub-graph with these operations with a single Inference Engine layer of type MVN.

The file <INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/tf/mvn.py performs such a replacement. The first part of the file is:

class MVN(FrontReplacementSubgraph):
enabled = True
def pattern(self):
log.debug('Enabled MVN replacement')
return dict(
nodes=[
('mean', dict(op='Mean')),
('sqdiff', dict(op='SquaredDifference')),
('variance', dict(op='Mean')),
('squeeze_mean', dict(op='Squeeze')),
('squeeze_variance', dict(op='Squeeze')),
('fbn', dict(op='FusedBatchNorm')),
],
edges=[
('sqdiff', 'variance', {'in': 0}),
('mean', 'squeeze_mean', {'in': 0}),
('variance', 'squeeze_variance', {'in': 0}),
('squeeze_mean', 'fbn', {'in': 3}),
('squeeze_variance', 'fbn', {'in': 4}),
],
node_attrs=['op'],
edge_attrs=['in'])

Key lines:

• Line 1: Defines class MVN inherited from class FrontReplacementSubgraph that performs sub-graph replacement using sub-graph isomorphism pattern.
• Line 3: Sets class variable enabled to value True meaning that this replacer is enabled.
• The function pattern defines the sub-graph constraints to be matched. It returns a dictionary with four keys:
• the nodes defines a list of nodes to be matched. Each element in the list is a tuple. The first element is the alias name assigned for the matched node, the second element is a dictionary with desired attributes of the node.
• the edges defines a list of edges to be matched. Each element in the list is a tuple. The first and the second elements are the start and end edge nodes alias names respectively. The third element is a dictionary with desired edge attributes.
• the node_attrs contains the names of nodes attributes to use during sub-graph isomorphism search.
• the edge_attrs contains the names of edges attributes to use during sub-graph isomorphism search.

The sub-graph is matched if all provided constraints are satisfied. If at least one node with desired attributes is missing or at least one defined edge is absent, the sub-graph is not matched.

• Line 9: Adds constraint that sub-graph should contain node with attribute op with value Mean. The matched node gets an alias name mean. The same way the line 10 add constrain for node StopGradient, the matched node gets an alias name stop_grad.
• Line 18: Defines edge from node with alias name mean to node with alias name stop_grad having attribute in equal to 0. This means that the output of node mean is connected to the node stop_grad as a first input (Model Optimizer uses zero-based indexing that is why in is 0). Another example of defining the edges constraints is in line 25 where the edge from squeeze_mean is connected to the fbn node as fourth input.
• Lines 26 - 27: Specify a list of attributes to be checked. In fact, these lists are just list of all keys in the dictionaries for node and edge attributes.

Now when the Model Optimizer knows how to find sub-graph (step 1 of the sub-graph replacement), it is necessary to implement function that will perform actual sub-graph replacement (step 2 and 3). The code for this function is:

def replace_sub_graph(self, graph: nx.MultiDiGraph, match: dict):
fbn = match['fbn']
input = fbn.in_node(0)
log.debug('Found potential MVN pattern after {} with name {}'.format(input.op, input.name))
if input.id != match['mean'].in_node(0).id or input.id != match['sqdiff'].in_node(0).id:
return
log.debug('Confirmed MVN pattern after {} with name {}'.format(input.op, input.name))
MVN = Op.get_op_class_by_name('MVN')
mvn = MVN(graph, dict(
name=fbn.name + '/MVN_',
eps=fbn.eps,
required_reduction_indices=[1,2] if fbn.data_format == b'NHWC' else [2,3]
))
mvn.attrs['old_infer'] = mvn.attrs['infer']
mvn.attrs['infer'] = __class__.infer
mul = Eltwise(graph, dict(operation='mul', name=fbn.name + '/Mul_'))
input_gamma = fbn.in_node(1)
input_beta = fbn.in_node(2)
mean_reduction = match['mean'].in_node(1)
variance_reduction = match['mean'].in_node(1)
mul.create_node([
mvn.create_node([input, mean_reduction, variance_reduction]),
input_gamma
]),
input_beta
])
replace_node(fbn, new_subgraph)

The function accepts two arguments - the graph and the dictionary match. The keys in the dictionary are the alias names of matched nodes (defined in the nodes list in the function pattern) and the values are the matched node of the graph (the instance of Node object).

The function generates new sub-graph with node of type MVN and two nodes of the type Eltwise calculating sum and product. There is nothing interesting in how the graph is generated and mathematics behind that, so attention will be put to two aspects of this function.

The first one is the call to function replace_node in line 36. FusedBatchNorm node is replaced with the output node of the generated sub-graph: all input edges of the FusedBatchNorm node are re-connected to the new_subgraph node, all consumers of the FusedBatchNorm node are updated to get inputs from the new_subgraph node. This action connects newly generated sub-graph with an existing graph (step 4 of the sub-graph replacement).

The second one is that the default implementation of the inference function for MVN operation is overwritten. In line 16, the default implementation of the inference function for MVN is saved to attribute old_infer. In line 17, the new inference function is saved to the instance of the MVN operation class. The new inference function code looks the following way:

@staticmethod
def infer(node: Node):
if not(node.in_node(1).has_valid('value') and node.in_node(2).has_valid('value')):
log.warning('Reduction indices for mean and variance for MVN node {} are not constants'.format(node.name))
return
if not(all(node.in_node(1).value == node.required_reduction_indices) and
all(node.in_node(2).value == node.required_reduction_indices)):
log.warning('Reduction indices for mean {} and variance {} do not match required ones {}'.format(
node.in_node(1).value,
node.in_node(2).value,
node.required_reduction_indices
))
return
node.graph.remove_edge(node.in_node(1).id, node.id)
node.graph.remove_edge(node.in_node(2).id, node.id)
node.old_infer(node)

The infer function is needed to infer value of the node (if it is possible) and to infer shapes of the output tensors of the node (mandatory). The custom infer function performs additional checks that describe limitations of the MVN layer implementation in the Inference Engine. For example, reduction indices for mean and variance must be constants (line 10), while in TensorFlow they could be computed during model inference. In addition, the function removes two edges from the graph (lines 17 and 18) because all required information is already stored in the MVN node attributes. This is due to different MVN layer implementation in Inference Engine and TensorFlow*: mean and variance are attributes of the node in Inference Engine while in TensorFlow they are input tensors. Edges are not removed in the replace_sub_graph function, because these edges are used in the infer function (lines 7-12).

The last action in the infer method (line 19) is to call default infer function for the MVN, which is saved in the attribute old_infer of the node to infer output tensors shapes.

On the step 5 of the sub-graph replacement, six matching nodes are automatically removed during the dead code elimination pass that is performed after applying of custom sub-graph replacements defined. Six matching nodes are no more connected to the inputs of the network after replacing node fbn with a newly created sub-graph node. Since they are not marked as output nodes (using --output command line parameter), they could be removed.

The replacement works for all sub-graph isomorphism instances found in the network.

### Replace Sub-graph of Operations Using Nodes Name Pattern

TensorFlow uses a mechanism of scope to group related operation nodes. It is a good practice to put nodes performing particular task into the scope. This approach divides a graph into logical blocks that are easier to review in TensorBoard*. The scope, in fact, just defines a common prefix for the node names in the scope.

For example, Inception topologies contain several types of so-called "Inception blocks". Some of them are exactly equal to each other, but located in different places of the network. For example, Inception V4 from tensorflow.contrib.slim module has inception blocks Mixed_5b, Mixed_5c and Mixed_5d with exactly the same nodes with the same attributes.

Now consider situation when someone implemented these Inception blocks extremely efficiently using single Inference Engine custom layer called InceptionBlock and would like to replace these blocks with instances of the layer to decrease inference time. Model Optimizer provides mechanism to replace sub-graph of operations defined by the regular expressions for the node names prefixes (scope). In this particular case, some of the patterns are: .*InceptionV4/Mixed_5b, .*InceptionV4/Mixed_5c and .*InceptionV4/Mixed_5d. Each pattern starts with .*, because a prefix InceptionV4 is added to all nodes names during a model freeze.

The sub-graph replacement using nodes name pattern is a bit trickier than replacements of single operation and networkx isomorphism pattern described above. You should do the following additional steps in comparison with previously described replacements:

1. Prepare configuration file template defining node names patterns and information about custom layer attributes.
2. Run Model Optimizer with command line parameter to add information about input and output nodes of the specified sub-graphs.

Consider the following possible configuration file for the Inception Block replacer:

[
{
"custom_attributes": {
"attr1_key": "attr1_value",
"attr2_key": 123456
},
"id": "InceptionBlockReplacer",
"op": "InceptionBlock",
"instances": [
".*InceptionV4/Mixed_5b",
".*InceptionV4/Mixed_5c",
".*InceptionV4/Mixed_5d"
],
"match_kind": "scope"
}
]

The .json file contains list of dictionaries. Each dictionary defines one replacement. Each replacement is defined with several keys:

• id (mandatory) is a unique identifier of the replacer. It is used in the Python* code that implements sub-graph replacement to link the class and the replacement description from the configuration file.
• match_kind (mandatory) is a string that specifies what matching algorithm is used. Currently supported scope and points. In this example, the first one is considered. The points match kind is described below.
• instances (mandatory) specifies instances of the sub-graph to be matched. It contains a list of node names prefixes patterns for the match kind scope.
• custom_attributes (optional) is a dictionary with static attributes of the layer to be dumped to Inference Engine Intermediate Representation .xml file.
• op (optional) is used only if the sub-graph replacement Python code is not needed, because the sub-graph should be replaced with a single node of type op. If this attribute is not set, it is necessary to implement Python code with sub-graph generation code. Both options are considered in this example.

When the configuration file is ready, run the Model Optimizer with regular command line parameters pointing to the file with model and input shapes (if necessary) and additional parameter --tensorflow_custom_operations_config_update pointing to the generated configuration file. If the file is correct, Model Optimizer adds two keys to the InceptionBlockReplacer dictionary: inputs and outputs with the following content:

[
{
"id": "InceptionBlockReplacer",
...
"inputs": [
[
{
"node": "Branch_2/Conv2d_0a_1x1/Conv2D$", "port": 0 }, { "node": "Branch_3/AvgPool_0a_3x3/AvgPool$",
"port": 0
},
{
"node": "Branch_1/Conv2d_0a_1x1/Conv2D$", "port": 0 }, { "node": "Branch_0/Conv2d_0a_1x1/Conv2D$",
"port": 0
}
]
],
"outputs": [
{
"node": "concat\$",
"port": 0
}
]
}
]

The value for key inputs is a list of lists describing input tensors of the sub-graph. Each element of the top-level list corresponds to one unique input tensor of the sub-graph. Each internal list describes a list of nodes consuming this tensor and port numbers where the tensor is consumed. Model Optimizer generates regular expressions for the input nodes names to uniquely identify them in each instance of the sub-graph defined by the instances. Denote these nodes as input nodes of the sub-graph.

In the InceptionV4 topology, the InceptionV4/Mixed_5b block has four input tensors from outside of the sub-graph, but all of them are produced by the node InceptionV4/Mixed_5a/concat. Therefore, the top-level list of the inputs contains one list corresponding to this tensor. Four input nodes of the sub-graph consume the tensor produced by InceptionV4/Mixed_5a/concat node. In this case, all four input nodes consume input tensor into port 0.

The order of items in the internal list describing nodes does not matter, but the order of elements in the top-level list is important. This order defines the order in which the Model Optimizer attaches input tensors to a new generated node if the sub-graph is replaced with a single node. The i-th input node of the sub-graph is obtained using call match.single_input_node(i) in the sub-graph replacer code. More information about API is given below. If you need to change the order of input tensors, you can edit the configuration file in the text-editor.

The value for the key outputs is a list describing nodes of the sub-graph producing tensor that goes outside of the sub-graph or does not have child nodes. Denote these nodes as output nodes of the sub-graph. The order of elements in the list is important. The i-th element of the list describes the i-th output tensor of the sub-graph, which could be obtained using call match.output_node(i). The order of elements can be manually changed in the configuration file. Model Optimizer uses this order to connect output edges if the sub-graph is replaced with a single node.

Now, when meaning of inputs and outputs attributes is clean, return back to the replacer implementation. The replacer InceptionBlockReplacer contains attribute op with the value InceptionBlock, which means that the identified sub-graph should be replaced with a single layer of type InceptionBlock. This layer is not known for the Model Optimizer, so it is necessary to define it. See Extending the Model Optimizer with New Primitives. You must create file extension/ops/InceptionBlock.py with the following content:

import numpy as np
from mo.graph.graph import Node
from mo.ops.op import Op
class InceptionBlock(Op):
op = "InceptionBlock"
enabled = True
def __init__(self, graph, attrs):
super().__init__(graph, attrs, {
'type': __class__.op,
'op': __class__.op,
})

The shape inference function is not defined. In this case, Model Optimizer uses TensorFlow fallback to calculate shapes of the sub-graph output tensors.

Run the Model Optimizer with the regular command line parameters, path to the model file and input shape (if necessary), and the parameter --tensorflow_use_custom_operations_config and point to the created configuration file. Model Optimizer generates Intermediate Representation .xml file with three sequential layers of type InceptionBlock like in the following example:

<layer id="1658" name="InceptionBlock1877" precision="FP32" type="InceptionBlock">
<input>
<port id="0">
<dim>1</dim>
<dim>384</dim>
<dim>35</dim>
<dim>35</dim>
</port>
</input>
<output>
<port id="1">
<dim>1</dim>
<dim>384</dim>
<dim>35</dim>
<dim>35</dim>
</port>
</output>
</layer>

The implementation of the sub-graph replacement by scope with a single layer is complete. The next subsection explains how Model Optimizer replaces sub-graph identified by start/end nodes (points) with another sub-graph.

### Replace Sub-graph of Operations Using Points

In this scenario, for the matching algorithm user defines the sub-graph via a set of "start" and "end" nodes. Given the set, the Model Optimizer performs the following steps:

1. Starts graph traversal from every start nodes following the direction of the graph edges. The search stops in end nodes or in case of nodes without further children. All visited nodes are added to the matched sub-graph.
2. Starts another graph traversal from each non-start node of the sub-graph, i.e. every node except nodes from "start" set. In this step the edges are traversed in the opposite edge direction. All newly visited nodes are added to the matched sub-graph. This step is needed to add nodes required for calculation values of internal nodes of the matched sub-graph.
3. Checks that all "end" nodes were reached from "input" nodes. If no then exit with error.
4. Check that there are no "Placeholder" operations among added nodes. If it is not true then some side branch of the sub-graph (added in step 2) depends on inputs of the network. Such configuration is not correct so exit with error.

This algorithm finds all nodes "between" start and end nodes. Also nodes needed for calculation of non-input nodes of the matched sub-graph produce constant values because they do not depend on input of the network. This sub-graph match has a limitation that each start node must have only one input. Therefore, it is not possible to specify, for example, convolution node as input because it has two inputs: data tensor and tensor with weights.

For example of replacement with points, please refer to the case-study of the conversion for the SSD models, created with TensorFlow Object Detection API.