“2014-06-13”版本间的差异
来自cslt Wiki
(以内容“==Resoruce Building== * Release management has been started == Leftover questions== * Asymmetric window: Great improvement on training set(WER 34% to 24%), however the...”创建新页面) |
(→Embedded decoder) |
||
第60行: | 第60行: | ||
===Embedded decoder=== | ===Embedded decoder=== | ||
− | + | <pre> | |
+ | 600 X 4+800 AM, beam9: | ||
150k 20k 10k 5k | 150k 20k 10k 5k | ||
WER 15.96 - - - | WER 15.96 - - - | ||
RT X 0.94 - - | RT X 0.94 - - | ||
− | + | </pre> | |
==LM development== | ==LM development== |
2014年6月13日 (五) 06:53的最后版本
目录
Resoruce Building
- Release management has been started
Leftover questions
- Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test.
- Multi GPU training: Error encountered
- Multilanguage training
- Investigating LOUDS FST.
- CLG embedded decoder plus online compiler.
- DNN-GMM co-training
AM development
Sparse DNN
- GA-based block sparsity (++++++)
- Paper revision under going.
Noise training
- Paper writing will be started this week
GFbank
- Running into Sinovoice 8k 1400 + 100 mixture training.
- GFbank 14 xEnt iteration completed:
Huawei disanpi BJ mobile 8k English data
FBank non-stream (17 iteration) 22.01% 26.63% 33.83% FBank non-stream (MPE1) 21.07% 22.91% 24.34% GFbank stream (18 iteration) 22.26%; 27.79% 35.10% GFbank non-stream (16 iteration) 22.45%; 27.25% 34.64%
Multilingual ASR
- TAG-based modeling is ok with a smaller step-in factor.
- Non-tag test should be conducted on both Baidu & micro blob data
Denoising & Farfield ASR
- With artificial reverberant, 2 x 1200 seems a more appropriate configuration. However great randomness was seen.
- With utterance-cmn, performance seems better than no-cmn, however, with global-cmn, the performance was strangely reduced.
- Should experiment with single-layer network with more hidden units.
- Record another set of far-field database. Pre-process done.
- Need obtain a baseline result with the new middle-far mic.
VAD
- DNN-based VAD (7.49) showers much better performance than energy based VAD (45.74)
- Need to test small scale network (+)
- 600-800 network test (+)
- 100 X 4 + 2 network training (+)
Scoring
- Collect more data with human scoring to train discriminative models
Embedded decoder
600 X 4+800 AM, beam9: 150k 20k 10k 5k WER 15.96 - - - RT X 0.94 - -
LM development
Domain specific LM
- Cross entropy filtering is not better than the key-based filtering.
- Seems possible to reduce the PPL with the extra retrieved data source.
Word2Vector
- Design network spider
- Design semantic related word tree
- First version based on pattern match done
- Filter with query log
- Further refinement with Baidu Baike hierarchy
NN LM
- Character-based NNLM (6700 chars, 7gram), 500M data training done.
- Inconsistent pattern in WER were found on Tenent test sets
- probably need to use another test set to do investigation.
- Investigate MS RNN LM training