GNA Plugin

Introducing the GNA Plugin

The GNA plugin was developed for low power scoring of neural networks on the Intel® Speech Enabling Developer Kit, the Amazon Alexa* Premium Far-Field Developer Kit, Intel® Pentium® Silver processor J5005, Intel® Celeron® processor J4005, Intel® Core™ i3-8121U processor, and others.

Supported Frameworks

The following frameworks have been tested in this release:

Refer to Supported Framework Layers for the list of supported standard layers.

BIOS, Library, and Drivers

This release was tested on Intel® NUC7CJYH with BIOS Update [JYGLKCPX.86A] Version: 0037, GNA library 01.00.00.1317 and driver 01.00.00.1310 (for Windows* and Linux*).

Supported Configuration Parameters

The plugin supports the configuration parameters listed below. The parameters are passed as std::map<std::string, std::string> on InferenceEngine::Core::LoadNetwork.

Parameter Name Parameter Values Default Description
GNA_COMPACT_MODE YES/NO YES Reuse I/O buffers to save space (makes debugging harder)
GNA_SCALE_FACTOR FP32 number 1.0 Scale factor to use for input quantization
KEY_GNA_DEVICE_MODE GNA_AUTO/GNA_HW/GNA_SW/GNA_SW_EXACT/GNA_SW_FP32 GNA_AUTO Execution mode (GNA and emulation modes)
KEY_GNA_FIRMWARE_MODEL_IMAGE string "" Name for embedded model binary dump file
KEY_GNA_PRECISION I16/I8 I16 Hint to GNA plugin: preferred integer weight resolution for quantization
KEY_PERF_COUNT YES/NO NO Turn on performance counters reporting
KEY_GNA_LIB_N_THREADS 1-127 integer number 1 Sets the number of GNA accelerator library worker threads used for inference computation in software modes

How to Interpret Performance Counters

As a result of collecting performance counters using InferenceEngine::IInferencePlugin::GetPerformanceCounts, you can find various performance data about execution on GNA. Returned map stores a counter description as a key, counter value is stored in the field realTime_uSec of InferenceEngineProfileInfo structure. Current GNA implementation calculates counters for whole utterance scoring and does not provide "per layer" information. API allows to retrieve counter units in cycles, but they can be converted to seconds as follows:

seconds = cycles/GNA frequency

Intel Core i3-8121U processor includes GNA with frequency 400MHz, and Intel Pentium Silver J5005 and Intel Celeron J4005 processors - 200MHz.

Performance counters provided for the time being:

Multithreading Support in GNA Plugin

The GNA plugin supports the following configuration parameters for multithreading management:

NOTE: Multithreading mode does not guarantee the same computation order as the order of issuing. Additionally, in this case, software modes do not implement any serializations.

Network Batch Size

The GNA plugin supports the processing of context-windowed speech frames in batches of 1-8 frames in one input blob using InferenceEngine::ICNNNetwork::setBatchSize. A big batch size value increases the speed of utterance processing. It is strongly recommended to use this network option if possible.

NOTE: For networks with convolutional and RNN/LSTM layers supported batch size equal to 1.

Compatibility with Heterogeneous Plugin

Heterogeneous plugin was tested with GNA as the primary device and CPU as a secondary. For running inference of networks with layers unsupported by the GNA plugin (for example, Softmax), you can use the Heterogeneous plugin with the following configuration HETERO:GNA,CPU. For the list of supported networks, see the Supported Frameworks.

NOTE: Due to limitation of GNA backend library, heterogenous support limited to cases where in resulted sliced graph there only one subgraph scheduled to run on GNA_HW or GNA_SW devices.

See Also