2013-12-27
来自cslt Wiki
目录
AM development
Sparse DNN
- Optimal Brain Damage(OBD).
- Online OBD held.
- OBD + L1 norm start to investigation.
- Efficient computing
- Conducting rearrangement the matrix structure and compose zero blocks by some smart approaches, leading to better computing speed.
Efficient DNN training
- Moment-based training. With m=0.2 performs the best on WER. Results are not so consistent. For 1900, m=0.2 is the best; For online1 and online2, m=0 is the best.
- Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test. Overfitting?
- Frame-skipping. Skipping 1 frame does not impact performance too much and speed up decoding in a consistent way. Skipping more frames lead to further performance degradation (no acceptable).
mom0.05 mom0.1 mom0.3 mom0.4 mom0.5 mom0.6 mom0.8 fs_1 fs_2 fs_3 ------------------------------------------------------------------------------------- avg_time | 4500 4175 3380 3460 3448 3521 4212 3149 2692 2716 RT | 1.52 1.44 1.12 1.14 1.14 1.16 1.38 1.04 0.90 0.92
Optimal phoneset
- Experiment 3 phone sets: Tencent, CSLT, PQ
CSLT: map 8: %WER 25.76 [ 3768 / 14628, 131 ins, 436 del, 3201 sub ] 2044 7: %WER 22.63 [ 5259 / 23241, 396 ins, 615 del, 4248 sub ] notetp3 7: %WER 15.76 [ 292 / 1853, 19 ins, 30 del, 243 sub ] record1900 11: %WER 5.98 [ 711 / 11888, 37 ins, 270 del, 404 sub ] general 7: %WER 36.21 [ 13622 / 37619, 543 ins, 1085 del, 11994 sub ] online1 12: %WER 37.73 [ 10729 / 28433, 634 ins, 2229 del, 7866 sub ] online2 13: %WER 28.95 [ 17112 / 59101, 1113 ins, 3015 del, 12984 sub ] speedup 9: %WER 25.71 [ 1351 / 5255, 49 ins, 276 del, 1026 sub ] Puqiang: map 9: %WER 24.25 [ 3547 / 14628, 115 ins, 428 del, 3004 sub ] 2044 8: %WER 22.80 [ 5300 / 23241, 425 ins, 665 del, 4210 sub ] notetp3 9: %WER 16.73 [ 310 / 1853, 34 ins, 28 del, 248 sub ] record1900 11: %WER 5.88 [ 699 / 11888, 54 ins, 257 del, 388 sub ] general 8: %WER 36.80 [ 13844 / 37619, 636 ins, 1102 del, 12106 sub ] online1 14: %WER 37.77 [ 10739 / 28433, 592 ins, 2401 del, 7746 sub ] online2 13: %WER 28.65 [ 16932 / 59101, 1136 ins, 2965 del, 12831 sub ] speedup 9: %WER 26.32 [ 1383 / 5255, 66 ins, 273 del, 1044 sub ] Tencent: map 8: %WER 25.83 [ 3778 / 14628, 157 ins, 486 del, 3135 sub ] 2044 8: %WER 24.51 [ 5697 / 23241, 502 ins, 765 del, 4430 sub ] notetp3 10: %WER 19.86 [ 368 / 1853, 36 ins, 45 del, 287 sub ] record1900 12: %WER 7.96 [ 946 / 11888, 50 ins, 378 del, 518 sub ] general 7: %WER 37.94 [ 14274 / 37619, 537 ins, 1270 del, 12467 sub ] online1 12: %WER 36.36 [ 10337 / 28433, 495 ins, 2082 del, 7760 sub ] online2 13: %WER 28.46 [ 16822 / 59101, 893 ins, 2940 del, 12989 sub ] speedup 10: %WER 28.53 [ 1499 / 5255, 62 ins, 349 del, 1088 sub ]
Engine optimization
- Investigating LOUDS FST. On progress.
LM development
NN LM
- Collecting more lexicon: 40k words related to music, 56k words from official dictionary.
- Working on NN LM based on word2vector.
Embedded development
- Narrow and deep small scale NN trained. Some errors.
- Embedded stream mode on progress.
- On-the-fly grammar compiler
- LG compile is fine
- CLG compile is fine
- HCLG compile is slow
- Working on speed up method.
Speech QA
- Use N-best to expand match in QA.
- 1-best matches 96/121
- 10-best matches 102/121
- Use N-best to recover errors in entity check
- Use Pinyin to recover errors in entity check