“2014-06-13”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
(以内容“==Resoruce Building== * Release management has been started == Leftover questions== * Asymmetric window: Great improvement on training set(WER 34% to 24%), however the...”创建新页面)
 
Embedded decoder
 
第60行: 第60行:
 
===Embedded decoder===
 
===Embedded decoder===
  
600X4+800 AM, beam9:  
+
<pre>
 +
600 X 4+800 AM, beam9:  
  
 
         150k      20k    10k      5k  
 
         150k      20k    10k      5k  
 
WER    15.96      -      -      -
 
WER    15.96      -      -      -
 
RT      X        0.94    -      -
 
RT      X        0.94    -      -
 
+
</pre>
  
 
==LM development==
 
==LM development==

2014年6月13日 (五) 06:53的最后版本

Resoruce Building

  • Release management has been started

Leftover questions

  • Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test.
  • Multi GPU training: Error encountered
  • Multilanguage training
  • Investigating LOUDS FST.
  • CLG embedded decoder plus online compiler.
  • DNN-GMM co-training

AM development

Sparse DNN

  • GA-based block sparsity (++++++)
  • Paper revision under going.

Noise training

  • Paper writing will be started this week


GFbank

  • Running into Sinovoice 8k 1400 + 100 mixture training.
  • GFbank 14 xEnt iteration completed:
                                  Huawei disanpi     BJ mobile   8k English data

FBank non-stream (17 iteration) 22.01% 26.63% 33.83% FBank non-stream (MPE1) 21.07% 22.91% 24.34% GFbank stream (18 iteration) 22.26%; 27.79% 35.10% GFbank non-stream (16 iteration) 22.45%; 27.25% 34.64%

Multilingual ASR

  • TAG-based modeling is ok with a smaller step-in factor.
  • Non-tag test should be conducted on both Baidu & micro blob data


Denoising & Farfield ASR

  • With artificial reverberant, 2 x 1200 seems a more appropriate configuration. However great randomness was seen.
  • With utterance-cmn, performance seems better than no-cmn, however, with global-cmn, the performance was strangely reduced.
  • Should experiment with single-layer network with more hidden units.
  • Record another set of far-field database. Pre-process done.
  • Need obtain a baseline result with the new middle-far mic.


VAD

  • DNN-based VAD (7.49) showers much better performance than energy based VAD (45.74)
  • Need to test small scale network (+)
  • 600-800 network test (+)
  • 100 X 4 + 2 network training (+)

Scoring

  • Collect more data with human scoring to train discriminative models


Embedded decoder

600 X 4+800 AM, beam9: 

        150k       20k     10k      5k 
WER     15.96       -       -       -
RT       X         0.94     -       -

LM development

Domain specific LM

  • Cross entropy filtering is not better than the key-based filtering.
  • Seems possible to reduce the PPL with the extra retrieved data source.


Word2Vector

  • Design network spider
  • Design semantic related word tree
  • First version based on pattern match done
  • Filter with query log
  • Further refinement with Baidu Baike hierarchy


NN LM

  • Character-based NNLM (6700 chars, 7gram), 500M data training done.
  • Inconsistent pattern in WER were found on Tenent test sets
  • probably need to use another test set to do investigation.
  • Investigate MS RNN LM training