“2014-01-03”版本间的差异
来自cslt Wiki
(以内容“== AM development == === Sparse DNN === * Optimal Brain Damage(OBD). # Online OBD held. # OBD + L1 norm start to investigation. * Efficient computing # Conducti...”创建新页面) |
(没有差异)
|
2014年1月3日 (五) 02:20的最后版本
目录
AM development
Sparse DNN
- Optimal Brain Damage(OBD).
- Online OBD held.
- OBD + L1 norm start to investigation.
- Efficient computing
- Conducting rearrangement the matrix structure and compose zero blocks by some smart approaches, leading to better computing speed.
Efficient DNN training
- L1-L2 grid checking: L1/L2(< 1e-6) seems good for record1900 but worse for other test sets.
- Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test. Overfitting?
- Frame-skipping. Skipping 1 frame speeds up decoding in a consistent way while retaining the accuracy largely. Skipping more frames lead to unacceptable performance degradation.
- Interpolation does not provide performance gain.
Optimal phoneset
- Analyze Tencent English phone set. Found some errors in CH/EN phone sharing.
- Develop a new sharing scheme, start training the new system.
- Start training for all-separated phones
- Start training mixed system with Chinglish data.
Engine optimization
- Investigating LOUDS FST. On progress.
LM development
NN LM
- Collecting a bigger lexicon: 40k words related to music, 56k words from an official dictionary.
- Working on NN LM based on word2vector.
Embedded development
- Liuchao's cellphone, Qualcomm Snapdragon Krait MSM8960 @ 1.5GHz, using 1 core
small nnet 100/600/600/600/600/1264 with MFCC input
- 4500 words:
- construct LG: 0.41s
- compose HCLG with det: 13.70s, 5.318 MB
- compose HCLG without det: 6.61s, 5.488 MB
- 950 words:
- construct LG: 0.15s
- compose HCLG with det: 2.63s, 0.947 MB, decode RT 0.649
- compose HCLG without det: 1.74s, 0.998 MB, decode RT 0.548
- For word list or simple grammars, determinization leads to small RT increase, but can improve HCLG compiling dramatically. This is particularly the case for embedded devices.
- The accuracy does not change with/without determinization.
Speech QA
- Use N-best to expand match in QA. Better performance were obtained.
- 1-best matches 96/121
- 10-best matches 102/121
- Use N-best to recover errors in entity check. Working on.
- Use Pinyin to recover errors in entity check. Future work.