Kaldi* Statistical Language Model Conversion Tool¶
The Kaldi* Statistical Language Model (SLM) Conversion Tool is a command-line tool that converts Kaldi language model resources to the format supported by the OpenVINO Speech Recognition Demos.
Command Line¶
kaldi_slm_convertion_tool HCLG.const.fst transitions.txt words.txt slm.fst labels.bin
Input Parameters¶
HCLG.fst
The HCLG.const.fst
parameter is the input weighted finite-state transducer (WFST) file in the OpenFST const format.
Most example scripts create a language model file in that format. If you have a WFST in a different OpenFST format, convert it with the following command:
$KALDI_ROOT/tools/openfst/bin/fstconvert --fst_type=const HCLG.fst HCLG.const.fst
The source Kaldi language model file HCLG.fst
can be found in directories like exp/tri2b/graph_xyz
, where tri2b
is the name of the model used for speech recognition tests.
transitions.txt The WFST transitions file describes the relations between WFST transitions and neural acoustic model outputs. This file is usually not generated by Kaldi example scripts, so you have to create it with the following command:
$KALDI_ROOT/src/bin/show-transitions phones.txt final.mdl > transitions.txt
For this call, the phones.txt
file is the phoneme description file, which can often be found in data/lang/phones.txt
. The final.mdl
file is the neural acoustic model that is used for speech recognition.
words.txt
The words.txt
file defines the mappings from word IDs used internally to their text representation. For many Kaldi example scripts, the file can be found in the same directory as HCLG.fst
.
Output Parameters¶
slm.fst
The output file slm.fst
is generated by the SLM Conversion Tool. It contains the information needed for the OpenVINO speech recognition demos for decoding.
labels.bin
The labels.bin
file defines mappings from word IDs to word strings, like the words.txt
file, but in the binary format. The OpenVINO speech recognition example needs the labels.bin
file to convert recognized words into a human-readable format.