
来自cslt Wiki
跳转至: 导航搜索

Resoruce Building

  • Current text resource has been re-arranged and listed

Leftover questions

  • Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test. Overfitting?
  • Multi GPU training: Error encountered
  • Multilanguage training
  • Investigating LOUDS FST.
  • CLG embedded decoder plus online compiler.
  • DNN-GMM co-training

AM development

Sparse DNN

  • GA-based block sparsity
  • Found a paper in 2000 with similar ideas.
  • Try to get a student working on high performance computing to do the optimization

Noise training

  • More experiments with no-noise
  • More experiments with additional noise types

AMR compression re-training

  • 1700h MPE adaptation done
  • 1700h stream mode adaptation runs into MPE1


  • Significant improvement found with GFBank
  • Significant improvement found with FBank + GFBank

Denoising & Farfield ASR

  • Recording done
  • Prepare to construct the baseline


  • Code ready, need to figure out speech/no-speech smooth

Farfield recognition


  • g-score based on MLP is done
  • t-score based on linear regression improves the performance

Word to Vector

  • LDA baseline (sogou 1700*9 training set) done
  • Wordvector classification is much better than the LDA system
word vector: 
           general: dict - 15w;   train_data - ren_ming_ri_bao(5g);  windows-5
           1. size - 50  time=30m 12thread
           2. size - 10  time=10m 12thread

data: class_num=9  document_num=9*2000
      train_num =9*1600
      test_num  =9*200
       dev_num  =9*200

                   C000008  C000010 C000013 C000014 C000016 C000020 C000022 C000023 C000024     total 
                       财经    IT      健康     体育      旅游   教育      招聘     文化    军事   
       lda_inf      0.845   0.2756  0.698   0.9502   0.63499  0.32   0.8080   0.3505 0.864    0.6385
      lda_inf_10    0.8149  0.0887  0.628   0.9641   0.5739   0.105  0.707363 0.2334 0.8628   0.553167
 w2v_filter_filer   0.7463  0.713   0.657   0.9106   0.68659  0.54   0.74638  0.692  0.84518  0.72638
w2v_filter_filer_10 0.7608  0.4323  0.57394 0.865    0.549    0.335  0.577    0.6129 0.78099  0.609769

                     C000008  C000010 C000013 C000014 C000016 C000020 C000022 C000023 C000024   total 
                       财经    IT      健康     体育      旅游   教育      招聘     文化    军事  
w2v_filter_filter    0.6865   0.7263   0.6716  0.84577 0.7462  0.46268 0.6567  0.7114  0.8905    0.71088
w2v_filter_filter_10 0.791    0.4079   0.56218 0.74129 0.62189 0.22885 0.562   0.6766  0.84079   0.603648
    lda_inf          0.8706   0.26368  0.6965  0.8009  0.582   0.2537  0.72139 0.3184  0.82587   0.59259
   lda_inf_10        0.776    0.1044   0.6467  0.9054  0.62189 0.1144  0.56218 0.24378 0.796     0.530127

note:w2v_filter--remove the stop word in traing word vector
note:w2v_filter_filter  -- remove the stop word in traing word vector and remove the documnet stop words

LM development


  • Character-based NNLM (6700 chars, 7gram), 500M data training done.
  • Non-boundary char LM is better than boundary char LM
  • Investigate MS RNN LM training


FST-based matching

  • Word-based FST 1-2 seconds with 1600 patterns. Huilan's implementation <1 second. ?????
  • Char-FST Implementation is done.

Speech QA

  • Investigate determinization of G embedding