You can download a pre-trained model for the ASpIRE Chain Time Delay Neural Network (TDNN) from the Kaldi* project official web-site.
To generate the Intermediate Representation (IR) of the model, run the Model Optimizer with the following parameters:
The IR will have two inputs:
input for data and
ivector for ivectors.
These instructions show how to run the converted model with the Speech Recognition sample. In this example, the input data contains one utterance from one speaker.
To follow the steps described below, you must first do the following:
README.mdin the repository.
egs/aspire/s5folder of the Kaldi repository.
To run the ASpIRE Chain TDNN Model with Speech Recognition sample:
README.txtfile from the downloaded model archive for instructions.
.arkformat. Refer to the corresponding sections below for instructions.
If you have a
.wav data file, you can convert it to
.ark format using the following command:
feats.ark absolute path to
feats.scp to avoid errors in later commands.
To prepare ivectors for the Speech Recognition sample, do the following:
feats.scpfile to the
egs/aspire/s5/directory of the built Kaldi repository and navigate there:
To simplify the preparation of ivectors for the Speech Recognition sample, specify the maximum number of frames in utterances as a parameter for
--ivector_period to get only one ivector per utterance.
To get the maximum number of frames in utterances, you can use the following command line:
As a result, in
<ivector folder>, you will find the
ivector_online.1.arkfile to text format using the
copy-featstool. Run the following command:
.arkfile must contain an ivector for each frame. You must copy the ivector
frame_counttimes. To do this, you can run the following script in the Python* command prompt:
Run the Speech Recognition sample with the created ivector
.ark file as follows:
Results can be decoded as described in "Use of Sample in Kaldi* Speech Recognition Pipeline" chapter in the Speech Recognition Sample description.