Arm® CPU Device

Introducing the Arm® CPU Plugin

The Arm® CPU plugin is developed in order to enable deep neural networks inference on Arm® CPU, using Compute Library as a backend.

Note

This is a community-level add-on to OpenVINO™. Intel® welcomes community participation in the OpenVINO™ ecosystem, technical questions and code contributions on community forums. However, this component has not undergone full release validation or qualification from Intel®, hence no official support is offered.

The Arm® CPU plugin is not a part of the Intel® Distribution of OpenVINO™ toolkit and is not distributed in the pre-built form. The plugin should be built from the source code for use. Plugin build procedure is described in How to build Arm® CPU plugin guide.

The set of supported layers is defined on the Op-set specification page.

Supported Inference Data Types

The Arm® CPU plugin supports the following data types as inference precision of internal primitives:

  • Floating-point data types:

    • f32

    • f16

  • Quantized data types:

    • i8 (support is experimental)

Hello Query Device C++ Sample can be used to print out supported data types for all detected devices.

Supported Features

Preprocessing Acceleration The Arm® CPU plugin supports the following accelerated preprocessing operations:

  • Precision conversion:

    • u8 -> u16, s16, s32

    • u16 -> u8, u32

    • s16 -> u8, s32

    • f16 -> f32

  • Transposition of tensors with dims < 5

  • Interpolation of 4D tensors with no padding (pads_begin and pads_end equal 0).

The Arm® CPU plugin supports the following preprocessing operations, however they are not accelerated:

  • Precision conversion that is not mentioned above

  • Color conversion:

    • NV12 to RGB

    • NV12 to BGR

    • i420 to RGB

    • i420 to BGR

For more details, see the preprocessing API guide.

Supported Properties

The plugin supports the properties listed below.

Read-write Properties In order to take effect, all parameters must be set before calling ov::Core::compile_model() or passed as additional argument to ov::Core::compile_model()

Read-only Properties

Known Layers Limitation

  • AvgPool layer is supported via arm_compute library for 4D input tensor and via reference implementation for other cases.

  • BatchToSpace layer is supported for 4D tensors only and constant nodes: block_shape with N = 1 and C = 1, crops_begin with zero values and crops_end with zero values.

  • ConvertLike layer is supported for configuration like Convert.

  • DepthToSpace layer is supported for 4D tensors only and for BLOCKS_FIRST of mode attribute.

  • Equal does not support broadcast for inputs.

  • Gather layer is supported for constant scalar or 1D indices axes only. Layer is supported via arm_compute library for non negative indices and via reference implementation otherwise.

  • Less does not support broadcast for inputs.

  • LessEqual does not support broadcast for inputs.

  • LRN layer is supported for axes = {1} or axes = {2, 3} only.

  • MaxPool-1 layer is supported via arm_compute library for 4D input tensor and via reference implementation for other cases.

  • Mod layer is supported for f32 only.

  • MVN layer is supported via arm_compute library for 2D inputs and false value of normalize_variance and false value of across_channels, for other cases layer is implemented via runtime reference.

  • Normalize layer is supported via arm_compute library with MAX value of eps_mode and axes = {2 | 3}, and for ADD value of eps_mode layer uses DecomposeNormalizeL2Add. For other cases layer is implemented via runtime reference.

  • NotEqual does not support broadcast for inputs.

  • Pad layer works with pad_mode = {REFLECT | CONSTANT | SYMMETRIC} parameters only.

  • Round layer is supported via arm_compute library with RoundMode::HALF_AWAY_FROM_ZERO value of mode, for other cases layer is implemented via runtime reference.

  • SpaceToBatch layer is supported for 4D tensors only and constant nodes: shapes, pads_begin or pads_end with zero paddings for batch or channels and one values shapes for batch and channels.

  • SpaceToDepth layer is supported for 4D tensors only and for BLOCKS_FIRST of mode attribute.

  • StridedSlice layer is supported via arm_compute library for tensors with dims < 5 and zero values of ellipsis_mask or zero values of new_axis_mask and shrink_axis_mask. For other cases, layer is implemented via runtime reference.

  • FakeQuantize layer is supported via arm_compute library, in Low Precision evaluation mode for suitable models, and via runtime reference otherwise.