LSTMCell

Versioned name: LSTMCell-1

Category: Sequence processing

Short description: LSTMCell operation represents a single LSTM cell. It computes the output using the formula described in the original paper Long Short-Term Memory.

Detailed description

Formula:
* - matrix mult
(.) - eltwise mult
[,] - concatenation
sigm - 1/(1 + e^{-x})
tanh - (e^{2x} - 1)/(e^{2x} + 1)
f = sigm(Wf*[Hi, X] + Bf)
i = sigm(Wi*[Hi, X] + Bi)
c = tanh(Wc*[Hi, X] + Bc)
o = sigm(Wo*[Hi, X] + Bo)
Co = f (.) Ci + i (.) c
Ho = o (.) tanh(Co)

Attributes

  • hidden_size
    • Description: hidden_size specifies hidden state size.
    • Range of values: a positive integer
    • Type: int
    • Default value: None
    • Required: yes
  • activations
    • Description: activations specifies activation functions for gates, there are three gates, so three activation functions should be specified as a value for this attributes
    • Range of values: any combination of relu, sigmoid, tanh
    • Type: a list of strings
    • Default value: sigmoid,tanh,tanh
    • Required: no
  • activations_alpha, activations_beta
    • Description: activations_alpha, activations_beta attributes of functions; applicability and meaning of these attributes depends on chosen activation functions
    • Range of values: a list of floating-point numbers
    • Type: float[]
    • Default value: None
    • Required: no
  • clip
    • Description: clip specifies bound values [-C, C] for tensor clipping. Clipping is performed before activations.
    • Range of values: a positive floating-point number
    • Type: float
    • Default value: infinity that means that the clipping is not applied
    • Required: no

Inputs

  • 1: X - 2D tensor of type T [batch_size, input_size], input data. Required.
  • 2: initial_hidden_state - 2D tensor of type T [batch_size, hidden_size]. Required.
  • 3: initial_cell_state - 2D tensor of type T [batch_size, hidden_size]. Required.
  • 4: W - 2D tensor of type T [4 * hidden_size, input_size], the weights for matrix multiplication, gate order: fico. Required.
  • 5: R - 2D tensor of type T [4 * hidden_size, hidden_size], the recurrence weights for matrix multiplication, gate order: fico. Required.
  • 6: B 1D tensor of type T [4 * hidden_size], the sum of biases (weights and recurrence weights). Required.

Outputs

  • 1: Ho - 2D tensor of type T [batch_size, hidden_size], the last output value of hidden state.
  • 2: Co - 2D tensor of type T [batch_size, hidden_size], the last output value of cell state.

Types

  • T: any supported floating point type.

Example

<layer ... type="LSTMCell" ...>
<data hidden_size="128"/>
<input>
<port id="0">
<dim>1</dim>
<dim>16</dim>
</port>
<port id="1">
<dim>1</dim>
<dim>128</dim>
</port>
<port id="2">
<dim>1</dim>
<dim>128</dim>
</port>
<port id="3">
<dim>512</dim>
<dim>16</dim>
</port>
<port id="4">
<dim>512</dim>
<dim>128</dim>
</port>
<port id="5">
<dim>512</dim>
</port>
</input>
<output>
<port id="6">
<dim>1</dim>
<dim>128</dim>
</port>
<port id="7">
<dim>1</dim>
<dim>128</dim>
</port>
</output>
</layer>