The GNA plugin was developed for low power scoring of neural networks on the Intel® Speech Enabling Developer Kit, the Amazon Alexa* Premium Far-Field Developer Kit, Intel® Pentium® Silver processor J5005, Intel® Celeron® processor J4005, Intel® Core™ i3-8121U processor, and others.
The following frameworks have been tested in this release:
Refer to Supported Framework Layers for the list of supported standard layers.
This release was tested on Intel® NUC7CJYH with BIOS Update [JYGLKCPX.86A] Version: 0037, GNA library 01.00.00.1317 and driver 01.00.00.1310 (for Windows* and Linux*).
The plugin supports the configuration parameters listed below. The parameters are passed as std::map<std::string, std::string>
on InferenceEngine::Core::LoadNetwork
.
Parameter Name | Parameter Values | Default | Description |
---|---|---|---|
GNA_COMPACT_MODE |
YES /NO |
YES |
Reuse I/O buffers to save space (makes debugging harder) |
GNA_SCALE_FACTOR |
FP32 number | 1.0 | Scale factor to use for input quantization |
KEY_GNA_DEVICE_MODE |
GNA_AUTO /GNA_HW /GNA_SW /GNA_SW_EXACT /GNA_SW_FP32 |
GNA_AUTO |
Execution mode (GNA and emulation modes) |
KEY_GNA_FIRMWARE_MODEL_IMAGE |
string |
"" |
Name for embedded model binary dump file |
KEY_GNA_PRECISION |
I16 /I8 |
I16 |
Hint to GNA plugin: preferred integer weight resolution for quantization |
KEY_PERF_COUNT |
YES /NO |
NO |
Turn on performance counters reporting |
KEY_GNA_LIB_N_THREADS |
1-127 integer number | 1 | Sets the number of GNA accelerator library worker threads used for inference computation in software modes |
As a result of collecting performance counters using InferenceEngine::IInferencePlugin::GetPerformanceCounts
, you can find various performance data about execution on GNA. Returned map stores a counter description as a key, counter value is stored in the field realTime_uSec
of InferenceEngineProfileInfo
structure. Current GNA implementation calculates counters for whole utterance scoring and does not provide "per layer" information. API allows to retrieve counter units in cycles, but they can be converted to seconds as follows:
Intel Core i3-8121U processor includes GNA with frequency 400MHz, and Intel Pentium Silver J5005 and Intel Celeron J4005 processors - 200MHz.
Performance counters provided for the time being:
The GNA plugin supports the following configuration parameters for multithreading management:
KEY_GNA_LIB_N_THREADS
By default, the GNA plugin uses one worker thread for inference computations. This parameter allows you to create up to 127 threads for software modes.
NOTE: Multithreading mode does not guarantee the same computation order as the order of issuing. Additionally, in this case, software modes do not implement any serializations.
The GNA plugin supports the processing of context-windowed speech frames in batches of 1-8 frames in one input blob using InferenceEngine::ICNNNetwork::setBatchSize
. A big batch size value increases the speed of utterance processing. It is strongly recommended to use this network option if possible.
NOTE: For networks with convolutional and RNN/LSTM layers supported batch size equal to 1.
Heterogeneous plugin was tested with GNA as the primary device and CPU as a secondary. For running inference of networks with layers unsupported by the GNA plugin (for example, Softmax), you can use the Heterogeneous plugin with the following configuration HETERO:GNA,CPU
. For the list of supported networks, see the Supported Frameworks.
NOTE: Due to limitation of GNA backend library, heterogenous support limited to cases where in resulted sliced graph there only one subgraph scheduled to run on GNA_HW or GNA_SW devices.