// doc/dependencies.dox // Copyright 2012 Johns Hopkins University (author: Daniel Povey) // See ../../COPYING for clarification regarding multiple authors // // Licensed under the Apache License, Version 2.0 (the "License"); // you may not use this file except in compliance with the License. // You may obtain a copy of the License at // http://www.apache.org/licenses/LICENSE-2.0 // THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY // KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED // WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE, // MERCHANTABLITY OR NON-INFRINGEMENT. // See the Apache 2 License for the specific language governing permissions and // limitations under the License. /** \page progress Recent progress and current activity Recently added features (and work in progress) - Added a "stable" branch; you can check it out with the command \verbatim svn co https://kaldi.svn.sourceforge.net/svnroot/kaldi/stable/ kaldi-stable \endverbatim This branch will be up-to-date with respect to bug fixes but will only contain code that is both finished and useful. So it is mostly a subset of what is in "trunk", with additional quality control. - Neural network progress: there is currently continuous progress on neural network based acoustic models. We will announce when this stabilizes to a particular recipe; hopefully this will happen within the next few months. - Online decoder (e.g. this will enable you to create a demo which takes audio from the sound card of your computer -- thanks to Matthias Paulik (Cisco) and Vassil Panayotov. See egs/voxforge/online_demo - Multilingual recipe (GlobalPhone). Currently work is progressing on recipes for the multilingual GlobalPhone database; see egs/gp/s5/. - Example setup using VoxForge data, which is available for free -- thanks to Vassil Panayotov. See egs/voxforge/s5/, and the blog post here . - Code for basis fMLLR adaptation-- this allows adaptation on shorter utterances than would normally be required (thanks to Yajie Miao). Currently, examples only in older scripts, e.g. egs/wsj/s1/run.sh, search for "basis". - Script to produce ctm-format output with confidence scores. See egs/swbd/s5/local/score_sclite_conf.sh. This script does Minimum Bayes Risk decoding by default (a bit like Consensus). - MPE training scripts (thanks to Chao Weng). Search for "mpe" in, for example, egs/rm/s5/run.sh to find the example. - The "s5" example scripts. These will be the "new style" scripts; they are cleaner and more general. See egs/{rm,wsj,swbd,tidigits}/s5/ - The "SGMM2" setup. This is a new version of the SGMM code that includes a couple of new features: - The "state-clustered tied mixture" idea (but for SGMMs not GMMs), whereby the SGMM sub-state vectors are tied among clusters of states with only the mixture weights being state-specific. This gives a consistent gain. - The "symmetric SGMM" idea previously published, which means speaker-dependent mixture weights. The gain from this is not consistent across all setups, but it possibly works reasonably consistently on larger setups. - A very minor script change: we don't update the projection matrices M for the first 4 iterations. The SGMM2 code is now finished and calls to the the scripts are present in egs/{rm,wsj,swbd}/s5/run.sh, but we don't yet have numbers in the RESULTS files. - MMI-based discriminative training with SGMMs. This works for both the SGMM and SGMM2 setup. It gives better results than the GMM+MMI+fMMI path. - Subspace fMLLR. Yajie Miao has implemented this. This is a way of estimating fMLLR robustly using less data, by making the fMLLR matrix a sum over basis matrices that capture the most important directions of variation. We don't yet have scripts for this in the current (s5/) setups. - Recurrent neural network language models (RNNLMs). This is not that recent any more, but anyway: in the egs/wsj/s5/run.sh example scripts, we include examples for building and using Tomas Mikolov's RNNLMs with Kaldi. - Example scripts: There is now a TIDIGITS example script (egs/tidigits) and an example script using a very small amount of data you can download from the web ("yesno"); and a Resource Management example setup (egs/rm/s4) that downloads features from the Web so you don't need to buy data. Current progress/activity: - Deep neural nets and other neural-net approaches. Karel Vesely is the main one working on this. Petr Motlicek has obtained a good-performing TANDEM setup with Kaldi, but it still depends on external tools (STK and TNet) to get the features. Ricky Chan has recently obtianed good performance using GPU based Kaldi neural-net training setup with CUDA 5.0. - Multilingual SGMMs. Arnab Ghoshal has been doing some work on this, and will be helped by Petr Motlicek and Milos Janda. */