AUGRUSequence#

Versioned name: AUGRUSequence

Category: Sequence processing

Short description: AUGRUSequence operation represents a series of AUGRU cells (GRU with attentional update gate).

Detailed description: The main difference between AUGRUSequence and GRUSequence is the additional attention score input `A`, which is a multiplier for the update gate. The AUGRU formula is based on the paper arXiv:1809.03672.

```AUGRU formula:
*  - matrix multiplication
(.) - Hadamard product (element-wise)

f, g - activation functions
z - update gate, r - reset gate, h - hidden gate
a - attention score

rt = f(Xt*(Wr^T) + Ht-1*(Rr^T) + Wbr + Rbr)
zt = f(Xt*(Wz^T) + Ht-1*(Rz^T) + Wbz + Rbz)
ht = g(Xt*(Wh^T) + (rt (.) Ht-1)*(Rh^T) + Rbh + Wbh)  # 'linear_before_reset' is False

zt' = (1 - at) (.) zt  # multiplication by attention score

Ht = (1 - zt') (.) ht + zt' (.) Ht-1
```

Activation functions for gates: sigmoid for f, tanh for g. Only `forward` direction is supported, so `num_directions` is always equal to `1`.

Attributes

• hidden_size

• Description: hidden_size specifies hidden state size.

• Range of values: a positive integer

• Type: `int`

• Required: yes

• activations

• Description: activation functions for gates

• Range of values: sigmoid, tanh

• Type: a list of strings

• Default value: sigmoid for f, tanh for g

• Required: no

• activations_alpha, activations_beta

• Description: activations_alpha, activations_beta attributes of functions; applicability and meaning of these attributes depends on chosen activation functions

• Range of values: []

• Type: `float[]`

• Default value: []

• Required: no

• clip

• Description: clip specifies bound values [-C, C] for tensor clipping. Clipping is performed before activations.

• Range of values: `0.`

• Type: `float`

• Default value: `0.` that means the clipping is not applied

• Required: no

• direction

• Description: Specify if the RNN is forward, reverse, or bidirectional. If it is one of forward or reverse then `num_directions = 1`, if it is bidirectional, then `num_directions = 2`. This `num_directions` value specifies input/output shape requirements.

• Range of values: forward

• Type: `string`

• Default value: forward

• Required: no

• linear_before_reset

• Description: linear_before_reset flag denotes, if the output of hidden gate is multiplied by the reset gate before or after linear transformation.

• Range of values: False

• Type: `boolean`

• Default value: False

• Required: no

Inputs

• 1: `X` - 3D tensor of type T1 `[batch_size, seq_length, input_size]`, input data. Required.

• 2: `H_t` - 3D tensor of type T1 and shape ```[batch_size, num_directions, hidden_size]```. Input with initial hidden state data. Required.

• 3: `sequence_lengths` - 1D tensor of type T2 and shape `[batch_size]`. Specifies real sequence lengths for each batch element. Required.

• 4: `W` - 3D tensor of type T1 and shape ```[num_directions, 3 * hidden_size, input_size]```. The weights for matrix multiplication, gate order: zrh. Required.

• 5: `R` - 3D tensor of type T1 and shape ```[num_directions, 3 * hidden_size, hidden_size]```. The recurrence weights for matrix multiplication, gate order: zrh. Required.

• 6: `B` - 2D tensor of type T1. The biases. If linear_before_reset is set to `False`, then the shape is `[num_directions, 3 * hidden_size]`, gate order: zrh. Otherwise the shape is `[num_directions, 4 * hidden_size]` - the sum of biases for z and r gates (weights and recurrence weights), the biases for h gate are placed separately. Required.

• 7: `A` - 3D tensor of type T1 `[batch_size, seq_length, 1]`, the attention score. Required.

Outputs

• 1: `Y` - 4D tensor of type T1 ```[batch_size, num_directions, seq_length, hidden_size]```, concatenation of all the intermediate output values of the hidden.

• 2: `Ho` - 3D tensor of type T1 `[batch_size, num_directions, hidden_size]`, the last output value of hidden state.

Types

• T1: any supported floating-point type.

• T2: any supported integer type.

Example

```<layer ... type="AUGRUSequence" ...>
<data hidden_size="128"/>
<input>
<port id="0"> <!-- `X` input data -->
<dim>1</dim>
<dim>4</dim>
<dim>16</dim>
</port>
<port id="1"> <!-- `H_t` input -->
<dim>1</dim>
<dim>1</dim>
<dim>128</dim>
</port>
<port id="2"> <!-- `sequence_lengths` input -->
<dim>1</dim>
</port>
<port id="3"> <!-- `W` weights input -->
<dim>1</dim>
<dim>384</dim>
<dim>16</dim>
</port>
<port id="4"> <!-- `R` recurrence weights input -->
<dim>1</dim>
<dim>384</dim>
<dim>128</dim>
</port>
<port id="5"> <!-- `B` bias input -->
<dim>1</dim>
<dim>384</dim>
</port>
<port id="6"> <!-- `A` attention score input -->
<dim>1</dim>
<dim>4</dim>
<dim>1</dim>
</port>
</input>
<output>
<port id="7"> <!-- `Y` output -->
<dim>1</dim>
<dim>1</dim>
<dim>4</dim>
<dim>128</dim>
</port>
<port id="8"> <!-- `Ho` output -->
<dim>1</dim>
<dim>1</dim>
<dim>128</dim>
</port>
</output>
</layer>
```