2014-06-03
来自cslt Wiki
目录
Resoruce Building
- Release management has been started
Leftover questions
- Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test. Overfitting?
- Multi GPU training: Error encountered
- Multilanguage training
- Investigating LOUDS FST.
- CLG embedded decoder plus online compiler.
- DNN-GMM co-training
AM development
Sparse DNN
- GA-based block sparsity (+++++)
Noise training
- All experiments completed.
- Paper writing will be started this week
GFbank
- Test on Tencent database is done. Better performance observed than Fbank
- Equal-loudness pre-filter added, slightly better performance was obtained
- Running into Sinovoice 8k 1400 + 100 mixture training. 9 xEnt iteration completed.
Multilingual ASR
- Multilingual LM decoding
- Fixing the non-tag bug ???
English model
(state-gauss = 10000 100000, various LM, beam 13) 1. Shujutang 100h chi-eng 16k: LM/AM | xEnt | mpe_1 | mpe_2 | mpe_3 | mpe_4 | --------- --------- --------- --------- --------- --------- wsj | 23.86 | 20.95 | 20.90 | 20.84 | 20.81 | cmu | 22.22 | - | - | - | 18.83 | giga | 21.77 | - | - | - | 18.61 | armid | 20.45 | - | - | - | - | 2. Shujutang 100H chi-eng 8k: LM/AM | xEnt | mpe_1 | mpe_2 | mpe_3 | mpe_4 | --------- --------- --------- --------- --------- --------- wsj | 26.27 | 23.63 | 23.14 | 22.93 | 23.00 | cmu | 24.11 | - | - | - | 20.36 | giga | 23.11 | - | - | - | 20.11 | armid | - | - | - | - | - | 3. voxforge pure eng 16k: LM/AM | xEnt | mpe_1 | mpe_2 | mpe_3 | mpe_4 | --------- --------- --------- --------- --------- --------- wsj | 21.38 | 24.89 | 24.50 | 23.31 | 23.13 | cmu | 24.00 | - | - | - | 21.33 | giga | 18.75 | - | - | - | 22.45 | armid | - | - | - | - | - | 4. fisher pure eng 8k: Not finish yet. LM/AM | xEnt | mpe_1 | mpe_2 | mpe_3 | mpe_4 | --------- --------- --------- --------- --------- --------- wsj | 40.65 | 36.16 | 35.94 | 35.88 | 35.80 | cmu | 35.07 | - | - | - | 31.16 | giga | 41.18 | - | - | - | 36.23 | armid | - | - | - | - | - |
Denoising & Farfield ASR
- Investigating DAE model
- Kaldi-based MSE obj training toolkit preparation
VAD
- DNN-based VAD (7.49) showers much better performance than energy based VAD (45.74)
- Need to test small scale network (+)
- 600-800 network
- 100 X 4 + 2
Scoring
- Bug for the stream mode fixed
Embedded decoder
- word list graph test passed
- wlist2LG toolkit checked in
- Prepare to deliver Android compiler options (.mk)
- Interface design should be completed in one day
- Prepare HCLG for 20k LM, decoding on progress.
LM development
Domain specific LM
- English lexicon done, build HCLG
- Re-build LM with the new lexicon
- Tested on Dianxin dev set
NN LM
- Character-based NNLM (6700 chars, 7gram), 500M data training done.
- Inconsistent pattern in WER were found on Tenent test sets
- probably need to use another test set to do investigation.
- Investigate MS RNN LM training
QA
FST-based matching
- Word-based FST 1-2 seconds with 1600 patterns. Huilan's implementation <1 second.
- THRAX toolkit for grammar to FST
- Investigate determinization of G embedding
- Refer to Kaldi new code