Arm® CPU Device¶
Introducing the Arm® CPU Plugin¶
The Arm® CPU plugin is developed in order to enable deep neural networks inference on Arm® CPU, using Compute Library as a backend.
Note
This is a community-level add-on to OpenVINO™. Intel® welcomes community participation in the OpenVINO™ ecosystem, technical questions and code contributions on community forums. However, this component has not undergone full release validation or qualification from Intel®, hence no official support is offered.
The Arm® CPU plugin is not a part of the Intel® Distribution of OpenVINO™ toolkit and is not distributed in the pre-built form. The plugin should be built from the source code for use. Plugin build procedure is described in How to build Arm® CPU plugin guide.
The set of supported layers is defined on the Op-set specification page.
Supported Inference Data Types¶
The Arm® CPU plugin supports the following data types as inference precision of internal primitives:
Floating-point data types:
f32
f16
Quantized data types:
i8 (support is experimental)
Hello Query Device C++ Sample can be used to print out supported data types for all detected devices.
Supported Features¶
Preprocessing Acceleration The Arm® CPU plugin supports the following accelerated preprocessing operations:
Precision conversion:
u8 -> u16, s16, s32
u16 -> u8, u32
s16 -> u8, s32
f16 -> f32
Transposition of tensors with dims < 5
Interpolation of 4D tensors with no padding (
pads_begin
andpads_end
equal 0).
The Arm® CPU plugin supports the following preprocessing operations, however they are not accelerated:
Precision conversion that is not mentioned above
Color conversion:
NV12 to RGB
NV12 to BGR
i420 to RGB
i420 to BGR
For more details, see the preprocessing API guide.
Supported Properties¶
The plugin supports the properties listed below.
Read-write Properties In order to take effect, all parameters must be set before calling ov::Core::compile_model()
or passed as additional argument to ov::Core::compile_model()
Read-only Properties
Known Layers Limitation¶
AvgPool
layer is supported via arm_compute library for 4D input tensor and via reference implementation for other cases.BatchToSpace
layer is supported for 4D tensors only and constant nodes:block_shape
withN
= 1 andC
= 1,crops_begin
with zero values andcrops_end
with zero values.ConvertLike
layer is supported for configuration likeConvert
.DepthToSpace
layer is supported for 4D tensors only and forBLOCKS_FIRST
ofmode
attribute.Equal
does not supportbroadcast
for inputs.Gather
layer is supported for constant scalar or 1D indices axes only. Layer is supported via arm_compute library for non negative indices and via reference implementation otherwise.Less
does not supportbroadcast
for inputs.LessEqual
does not supportbroadcast
for inputs.LRN
layer is supported foraxes = {1}
oraxes = {2, 3}
only.MaxPool-1
layer is supported via arm_compute library for 4D input tensor and via reference implementation for other cases.Mod
layer is supported for f32 only.MVN
layer is supported via arm_compute library for 2D inputs andfalse
value ofnormalize_variance
andfalse
value ofacross_channels
, for other cases layer is implemented via runtime reference.Normalize
layer is supported via arm_compute library withMAX
value ofeps_mode
andaxes = {2 | 3}
, and forADD
value ofeps_mode
layer usesDecomposeNormalizeL2Add
. For other cases layer is implemented via runtime reference.NotEqual
does not supportbroadcast
for inputs.Pad
layer works withpad_mode = {REFLECT | CONSTANT | SYMMETRIC}
parameters only.Round
layer is supported via arm_compute library withRoundMode::HALF_AWAY_FROM_ZERO
value ofmode
, for other cases layer is implemented via runtime reference.SpaceToBatch
layer is supported for 4D tensors only and constant nodes:shapes
,pads_begin
orpads_end
with zero paddings for batch or channels and one valuesshapes
for batch and channels.SpaceToDepth
layer is supported for 4D tensors only and forBLOCKS_FIRST
ofmode
attribute.StridedSlice
layer is supported via arm_compute library for tensors with dims < 5 and zero values ofellipsis_mask
or zero values ofnew_axis_mask
andshrink_axis_mask
. For other cases, layer is implemented via runtime reference.FakeQuantize
layer is supported via arm_compute library, in Low Precision evaluation mode for suitable models, and via runtime reference otherwise.