You can download a pre-trained model for the ASpIRE Chain Time Delay Neural Network (TDNN) from the Kaldi* project official web-site.
To generate the Intermediate Representation (IR) of the model, run the Model Optimizer with the following parameters:
The IR will have two inputs: input
for data and ivector
for ivectors.
These instructions show how to run the converted model with the Speech Recognition sample. In this example, the input data contains one utterance from one speaker.
To follow the steps described below, you must first do the following:
README.md
in the repository.egs/aspire/s5
folder of the Kaldi repository.To run the ASpIRE Chain TDNN Model with Speech Recognition sample:
README.txt
file from the downloaded model archive for instructions..ark
format. Refer to the corresponding sections below for instructions.If you have a .wav
data file, you can convert it to .ark
format using the following command:
To prepare ivectors for the Speech Recognition sample, do the following:
egs/aspire/s5/
directory of the built Kaldi repository: --ivector_period
to get only one ivector per utterance. As a result, in <ivector folder>
, you will find the ivector_online.1.ark
file.<ivector folder>
: ivector_online.1.ark
file to text format using the copy-feats
tool. Run the following command: .ark
file must contain an ivector for each frame. You must copy the ivector frame_count
times. To do this, you can run the following script in the Python* command prompt: .ark
file from .txt
: Run the Speech Recognition sample with the created ivector .ark
file as follows:
Results can be decoded as described in "Use of Sample in Kaldi* Speech Recognition Pipeline" chapter in the Speech Recognition Sample description.