// doc/dependencies.dox

// Copyright 2012  Johns Hopkins University (author: Daniel Povey)

// See ../../COPYING for clarification regarding multiple authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at

//  http://www.apache.org/licenses/LICENSE-2.0

// THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
// WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
// MERCHANTABLITY OR NON-INFRINGEMENT.
// See the Apache 2 License for the specific language governing permissions and
// limitations under the License.

/**

  \page progress Recent progress and current activity

   Recently added features (and work in progress)
    - Added a "stable" branch; you can check it out with the command
  \verbatim
     svn co https://kaldi.svn.sourceforge.net/svnroot/kaldi/stable/ kaldi-stable
  \endverbatim
      This branch will be up-to-date with respect to bug fixes but will only
      contain code that is both finished and useful.  So it is mostly a subset
      of what is in "trunk", with additional quality control. 
    - Neural network progress: there is currently continuous progress on neural network 
      based acoustic models.  We will announce when this stabilizes to a particular recipe;
      hopefully this will happen within the next few months.
    - Online decoder (e.g. this will enable you to create a demo which takes audio
      from the sound card of your computer -- thanks to Matthias Paulik (Cisco) and
      Vassil Panayotov.  See egs/voxforge/online_demo
    - Multilingual recipe (GlobalPhone).  Currently work is progressing on recipes for
      the multilingual GlobalPhone database; see egs/gp/s5/.
    - Example setup using VoxForge data, which is available for free -- thanks to
      Vassil Panayotov.  See egs/voxforge/s5/, and the blog post
       <a href=http://vpanayotov.blogspot.com/2012/07/voxforge-scripts-for-kaldi.html> here </a>.
    - Code for basis fMLLR adaptation-- this allows adaptation on shorter utterances
      than would normally be required (thanks to Yajie Miao).  Currently, examples only
      in older scripts, e.g. egs/wsj/s1/run.sh, search for "basis".
    - Script to produce ctm-format output with confidence scores.  See egs/swbd/s5/local/score_sclite_conf.sh.
      This script does Minimum Bayes Risk decoding by default (a bit like Consensus).  
    - MPE training scripts (thanks to Chao Weng).  Search for "mpe" in, for example,
      egs/rm/s5/run.sh to find the example.
    - The "s5" example scripts.  These will be the "new style" scripts; they are cleaner and more
      general.  See egs/{rm,wsj,swbd,tidigits}/s5/
    - The "SGMM2" setup.  This is a new version of the SGMM code that includes a couple of new features:
       - The "state-clustered tied mixture" idea (but for SGMMs not GMMs), whereby the SGMM sub-state vectors
          are tied among clusters of states with only the mixture weights being state-specific.  This gives
          a consistent gain.
       - The "symmetric SGMM" idea previously published, which means speaker-dependent mixture weights.  The
         gain from this is not consistent across all setups, but it possibly works reasonably consistently on larger
         setups.
       - A very minor script change: we don't update the projection matrices M for the first 4 iterations.
      The SGMM2 code is now finished and calls to the the scripts are present in egs/{rm,wsj,swbd}/s5/run.sh, 
      but we don't yet have numbers in the RESULTS files.
    - MMI-based discriminative training with SGMMs.  This works for both the SGMM and SGMM2 setup.  It gives
      better results than the GMM+MMI+fMMI path.
    - Subspace fMLLR.  Yajie Miao has implemented this.  This is a way of estimating fMLLR robustly using less
      data, by making the fMLLR matrix a sum over basis matrices that capture the most important directions
      of variation.  We don't yet have scripts for this in the current (s5/) setups.
    - Recurrent neural network language models (RNNLMs).   This is not that recent any more, but anyway: in the
      egs/wsj/s5/run.sh example scripts, we include examples for building and using Tomas Mikolov's RNNLMs with
      Kaldi.
    - Example scripts:  There is now a TIDIGITS example script (egs/tidigits) and an example script
      using a very small amount of data you can download from the web ("yesno"); and a Resource Management
      example setup (egs/rm/s4) that downloads features from the Web so you don't need to buy data.

   Current progress/activity:
    - Deep neural nets and other neural-net approaches.  Karel Vesely is the main one working on this.
      Petr Motlicek has obtained a good-performing TANDEM setup with Kaldi, but it still depends on
      external tools (STK and TNet) to get the features.  Ricky Chan has recently obtianed good performance 
      using GPU based Kaldi neural-net training setup with CUDA 5.0.
    - Multilingual SGMMs.  Arnab Ghoshal has been doing some work on this, and will be helped by Petr Motlicek
      and Milos Janda.
      
    
*/