// doc/cudamatrix.dox


// Copyright 2012 Karel Vesely

// See ../../COPYING for clarification regarding multiple authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at

//  http://www.apache.org/licenses/LICENSE-2.0

// THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
// WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
// MERCHANTABLITY OR NON-INFRINGEMENT.
// See the Apache 2 License for the specific language governing permissions and
// limitations under the License.

namespace kaldi {
/**
  \page cudamatrix The CUDA Matrix library

   The CUDA matrix library seamless wrapper of CUDA computation.
   Its purpose is to separate the low level CUDA-dependent routines 
   from the high level C++ code.

   The library can be both compiled with or without CUDA libraries,
   depending on the HAVE_CUDA==1 macro. Without CUDA, the library
   backs-off to computation on host processor. The host processor
   is also used when the toolkit is compiled with CUDA and no suitable
   GPU is detected. This is particularly useful in heterogenous ``grid-like'' environments.

   Computationally, the library is based on CUBLAS linear algebra operations,
   and manually implemened grid-like kernels for the non-linear operations, 
   which are conforming with the ``Map'' pattern.
   While most of the ``Reduce'' kernels do use the tree-like computational 
   pattern in conjunction with extensive use of the shared memory.

   \section Important classes

   The most important classes are: \ref CuDevice \ref CuMatrix \ref CuVector \ref CuStlVector.

   \ref CuDevice : is an abstraction of the GPU board, it is a singleton
   object which initializes CUBLAS library upon the application startup,
   and releases it at the end. It is also used to collect the profiling statistics.

   \ref CuMatrix : is a GPU analogy of the Matrix class. It holds a buffer in the GPU global memory,
   as well as a backup CPU buffer. It implements a subset of the Matrix interface.
   The host-GPU transfers are done by \ref CopyFromMat \ref CopyToMat methods, 
   which may internally reallocate the buffers.

   \ref CuVector : is a GPU analogy of the Vector class. It holds a buffer in the GPU global memory,
   as well as a backup CPU buffer. It implements a subset of the Vector interface.
   The host-GPU transfers are done by \ref CopyFromVec \ref CopyToVec methods, 
   which may internally reallocate the buffers.

   \ref CuStlVector : is particularly useful to create vectors of indices (int32)
   

   \section Standalone mathematical operations
   In cu-math.h are math functions which cannot be associated solely to a vector or a matrix.
   There are concentrated in the namespace cu::, in order to separate them from global namespace. 
   
   \section The kernels
   The CUDA kernels are concentrated in the cu-kernels.cu file. Since the CUDA code is compiled by NVCC,
   and the rest of the code is compiled by different compiler, the only possible way of interatation 
   was to employ ANSI C interface \ref cu-kernels.h, which represents a low-level interface to CUDA.
   The high level interface is via CuMatrix, CuVector and functions in the cu:: namespace.

*/

}