Decoding
The one and only reason why all other applications are developed is being able to decode! That's why the decoder, the hart of the toolkit, is just called... Shout!Run ./shout with the output meta-data file of shout_vtln (or of shout_cluster if no VTLN is needed). Next is a short description of the most important parameters. Please run ./shout -h for more help.
Model settings
The decoder needs a language model file (lm), acoustic model file (amp) and a lexical tree file (dct). All files should be binary files created by the shout toolkit.
Search settings
The search space of the decoder is restricted using five parameters. If these paramters are not assigned a value, the default values (shown when shout is started with -cc) will be used.The five search restriction parameters:
- BEAM (floating point number)
- STATE_BEAM (floating point number)
- END_STATE_BEAM (floating point number)
- HISTOGRAM_STATE_PRUNING (positive number)
- HISTOGRAM_PRUNING (positive number)
AM and LM scaling settings
The most likely paths in the jungle of feature vectors are calculated using a language model and acoustic models. The scaling between the two types of models influences the outcome of the trip through this jungle. This scaling is set using three parameters in the formula:Score(LM_SCALE,TRANS_PENALTY,SIL_PENALTY) = ln(AMSCORE) + LM_SCALE*lm(LMSCORE) + TRANS_PENALTY*NR_WORDS + SIL_PENALTY*NR_SIL
Shout has implemented an efficient method of incorporating the LM score in the search. This method, Language Model Look-Ahead, is switched on by default, but it can be toggled on or off in the configuration file.
- LM_SCALE (floating point number)
- TRANS_PENALTY (floating point number)
- SIL_PENALTY (floating point number)
- LMLA (1=on, 0=off)
Alignment
You can specify a special background dictionary if you want to perform alignment with OOV marking. For performing alignment instead of ASR, simply set the forced-alignment parameter (see ./shout -h). Make sure to add the utterance to align in the meta-data file, starting with <s> <s>. See the training use-case for more information.
Type of output (text or XML)
- XML (output will be in XML format)