GELU- Gaussian Error Linear Unit

Versioned name: Gelu-2

Category: Activation

Short description: Reference

Detailed description: Reference

Attributes: Gelu operation has no attributes.

Mathematical Formulation Gelu(x)=x*Φ(x), where Φ(x) is the Cumulative Distribution Function for Gaussian Distribution. The following equivalent combination is recognized and fused into single Gelu op:

\[ Gelu(x) = 0.5*x*(1 + erf((x) / sqrt(2) ) \]

Similarly, the following Gelu approximation (typical for the TensorFlow*) is recognized and fused into single Gelu op

\[ Gelu(x) \approx 0.5*x*(1 + tanh((sqrt(2/pi)) * (x + 0.044715 * x ^ 3)) \]

Inputs:

Example

<layer ... type="Gelu">
<input>
<port id="0">
<dim>1</dim>
<dim>128</dim>
</port>
</input>
<output>
<port id="1">
<dim>1</dim>
<dim>128</dim>
</port>
</output>
</layer>