Converting a PyTorch QuartzNet Model¶
NeMo project provides the QuartzNet model.
Downloading the Pre-trained QuartzNet Model¶
To download the pre-trained model, refer to the NeMo Speech Models Catalog. Here are the instructions on how to obtain QuartzNet in ONNX format.
Install the NeMo toolkit, using the instructions.
Run the following code:
import nemo import nemo.collections.asr as nemo_asr quartznet = nemo_asr.models.EncDecCTCModel.from_pretrained(model_name="QuartzNet15x5Base-En") # Export QuartzNet model to ONNX format quartznet.decoder.export('decoder_qn.onnx') quartznet.encoder.export('encoder_qn.onnx') quartznet.export('qn.onnx')
This code produces 3 ONNX model files:
encoder_qn.onnx
,decoder_qn.onnx
,qn.onnx
. They aredecoder
,encoder
, and a combineddecoder(encoder(x))
models, respectively.
Converting an ONNX QuartzNet model to IR¶
If using a combined model:
mo --input_model <MODEL_DIR>/qt.onnx --input_shape [B,64,X]
If using separate models:
mo --input_model <MODEL_DIR>/encoder_qt.onnx --input_shape [B,64,X]
mo --input_model <MODEL_DIR>/decoder_qt.onnx --input_shape [B,1024,Y]
Where shape is determined by the audio file Mel-Spectrogram length: B
- batch dimension, X
- dimension based on the input length, Y
- determined by encoder output, usually X / 2
.