BatchNormInference#

Versioned name: BatchNormInference-5

Category: Normalization

Short description: BatchNormInference performs Batch Normalization operation described in the Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift article.

Detailed Description

BatchNormInference performs the following operations on a given data batch input tensor data:

  • Normalizes each activation x(k) by the mean and variance.

    x^(k)=x(k)E[x(k)]Var(x(k))+ϵ

    where E[x(k)] and Var(x(k)) are the mean and variance, calculated per channel axis of data input, and correspond to mean and variance inputs, respectively. Additionally, ϵ is a value added to the variance for numerical stability and corresponds to epsilon attribute.

  • Performs linear transformation of each normalized activation based on gamma and beta input, representing the scaling factor and shift, respectively.

    y^(k)=γ(k)x^(k)+β(k)

    where γ(k) and β(k) are learnable parameters, calculated per channel axis, and correspond to gamma and beta inputs.

Mathematical Formulation

Let x be a d-dimensional input, x=(x1xd). Since normalization is applied to each activation E[x(k)], you can focus on a particular activation and omit k.

For a particular activation, consider a mini-batch B of m values. BatchNormInference performs Batch Normalization algorithm as follows:

  • Input: Values of x over a mini-batch:

    B=x1...m
  • Parameters to learn: γ,β

  • Output:

    oi=BNγ,β(bi)
  • Mini-batch mean:

    μB1mi=1mbi
  • Mini-batch variance:

    σB21mi=1m(biμB)2
  • Normalize:

    bi^biμBσB2+ϵ
  • Scale and shift:

    oiγbi^+β=BNγ,β(bi)

Attributes:

  • epsilon

    • Description: epsilon is a constant added to the variance for numerical stability.

    • Range of values: a floating-point number greater than or equal to zero

    • Type: float

    • Required: yes

Inputs

  • 1: data - A tensor of type T and at least rank 2. The second dimension represents the channel axis and must have a span of at least 1. Required.

  • 2: gamma - Scaling factor for normalized value. A 1D tensor of type T with the same span as data channel axis. Required.

  • 3: beta - Bias added to the scaled normalized value. A 1D tensor of type T with the same span as data channel axis. Required.

  • 4: mean - Value for mean normalization. A 1D tensor of type T with the same span as data channel axis. Required.

  • 5: variance - Value for variance normalization. A 1D tensor of type T with the same span as data channel axis. Required.

Outputs

  • 1: The result of element-wise Batch Normalization operation applied to the input tensor data. A tensor of type T and the same shape as data input tensor.

Types

  • T: any supported floating-point type.

Examples

Example: 2D input tensor data

<layer ... type="BatchNormInference" ...>
    <data epsilon="9.99e-06" />
    <input>
        <port id="0">  <!-- input -->
            <dim>10</dim>
            <dim>128</dim>
        </port>
        <port id="1">  <!-- gamma -->
            <dim>128</dim>
        </port>
        <port id="2">  <!-- beta -->
            <dim>128</dim>
        </port>
        <port id="3">  <!-- mean -->
            <dim>128</dim>
        </port>
        <port id="4">  <!-- variance -->
            <dim>128</dim>
        </port>
    </input>
    <output>
        <port id="5">
            <dim>10</dim>
            <dim>128</dim>
        </port>
    </output>
</layer>

Example: 4D input tensor data

<layer ... type="BatchNormInference" ...>
    <data epsilon="9.99e-06" />
    <input>
        <port id="0">  <!-- input -->
            <dim>1</dim>
            <dim>3</dim>
            <dim>224</dim>
            <dim>224</dim>
        </port>
        <port id="1">  <!-- gamma -->
            <dim>3</dim>
        </port>
        <port id="2">  <!-- beta -->
            <dim>3</dim>
        </port>
        <port id="3">  <!-- mean -->
            <dim>3</dim>
        </port>
        <port id="4">  <!-- variance -->
            <dim>3</dim>
        </port>
    </input>
    <output>
        <port id="5">
            <dim>1</dim>
            <dim>3</dim>
            <dim>224</dim>
            <dim>224</dim>
        </port>
    </output>
</layer>