“2014-06-03”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
(以内容“==Resoruce Building== * Release management has been started == Leftover questions== * Asymmetric window: Great improvement on training set(WER 34% to 24%), however the...”创建新页面)
 
 
(相同用户的一个中间修订版本未显示)
第29行: 第29行:
  
 
* Multilingual LM decoding  
 
* Multilingual LM decoding  
* Fixing the non-tag bug ???
+
* non-tag bug investigation with some digital string recordings
 +
* Revert to hanzi numbers
 +
 
  
 
===English model===
 
===English model===
第106行: 第108行:
  
 
===Domain specific LM===
 
===Domain specific LM===
* English lexicon done, build HCLG
 
* Re-build LM with the new lexicon
 
* Tested on Dianxin dev set
 
  
 +
* Retrieve both Baidu & microblog
 +
* PPL testing
 +
* Need to check into gitLab.
  
 
===NN LM===
 
===NN LM===
第118行: 第120行:
  
 
* Investigate MS RNN LM training
 
* Investigate MS RNN LM training
 
==QA==
 
 
===FST-based matching===
 
:* Word-based FST 1-2 seconds with 1600 patterns. Huilan's implementation <1 second.
 
:* THRAX toolkit for grammar to FST
 
 
* Investigate determinization of G embedding
 
:* Refer to Kaldi new code
 

2014年6月3日 (二) 02:28的最后版本

Resoruce Building

  • Release management has been started

Leftover questions

  • Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test. Overfitting?
  • Multi GPU training: Error encountered
  • Multilanguage training
  • Investigating LOUDS FST.
  • CLG embedded decoder plus online compiler.
  • DNN-GMM co-training

AM development

Sparse DNN

  • GA-based block sparsity (+++++)

Noise training

  • All experiments completed.
  • Paper writing will be started this week

GFbank

  • Test on Tencent database is done. Better performance observed than Fbank
  • Equal-loudness pre-filter added, slightly better performance was obtained
  • Running into Sinovoice 8k 1400 + 100 mixture training. 9 xEnt iteration completed.


Multilingual ASR

  • Multilingual LM decoding
  • non-tag bug investigation with some digital string recordings
  • Revert to hanzi numbers


English model


(state-gauss = 10000 100000, various LM, beam 13)

1. Shujutang 100h chi-eng 16k:

  LM/AM  |  xEnt   |  mpe_1  |  mpe_2  |  mpe_3  |  mpe_4  |
--------- --------- --------- --------- --------- ---------
   wsj   |  23.86  |  20.95  |  20.90  |  20.84  |  20.81  |
   cmu   |  22.22  |    -    |    -    |    -    |  18.83  |
   giga  |  21.77  |    -    |    -    |    -    |  18.61  |
  armid  |  20.45  |    -    |    -    |    -    |    -    |


2. Shujutang 100H chi-eng 8k:

  LM/AM  |  xEnt   |  mpe_1  |  mpe_2  |  mpe_3  |  mpe_4  |
--------- --------- --------- --------- --------- ---------
   wsj   |  26.27  |  23.63  |  23.14  |  22.93  |  23.00  |
   cmu   |  24.11  |    -    |    -    |    -    |  20.36  |
   giga  |  23.11  |    -    |    -    |    -    |  20.11  |
  armid  |    -    |    -    |    -    |    -    |    -    |


3. voxforge pure eng 16k:

  LM/AM  |  xEnt   |  mpe_1  |  mpe_2  |  mpe_3  |  mpe_4  |
--------- --------- --------- --------- --------- ---------
   wsj   |  21.38  |  24.89  |  24.50  |  23.31  |  23.13  |
   cmu   |  24.00  |    -    |    -    |    -    |  21.33  |
   giga  |  18.75  |    -    |    -    |    -    |  22.45  |
  armid  |    -    |    -    |    -    |    -    |    -    |

4. fisher pure eng 8k:
Not finish yet.
  LM/AM  |  xEnt   |  mpe_1  |  mpe_2  |  mpe_3  |  mpe_4  |
--------- --------- --------- --------- --------- ---------
   wsj   |  40.65  |  36.16  |  35.94  |  35.88  |  35.80  |
   cmu   |  35.07  |    -    |    -    |    -    |  31.16  |
   giga  |  41.18  |    -    |    -    |    -    |  36.23  |
  armid  |    -    |    -    |    -    |    -    |    -    |


Denoising & Farfield ASR

  • Investigating DAE model
  • Kaldi-based MSE obj training toolkit preparation

VAD

  • DNN-based VAD (7.49) showers much better performance than energy based VAD (45.74)
  • Need to test small scale network (+)
  • 600-800 network
  • 100 X 4 + 2

Scoring

  • Bug for the stream mode fixed


Embedded decoder

  • word list graph test passed
  • wlist2LG toolkit checked in
  • Prepare to deliver Android compiler options (.mk)
  • Interface design should be completed in one day
  • Prepare HCLG for 20k LM, decoding on progress.


LM development

Domain specific LM

  • Retrieve both Baidu & microblog
  • PPL testing
  • Need to check into gitLab.

NN LM

  • Character-based NNLM (6700 chars, 7gram), 500M data training done.
  • Inconsistent pattern in WER were found on Tenent test sets
  • probably need to use another test set to do investigation.
  • Investigate MS RNN LM training