“2013-12-27”版本间的差异
来自cslt Wiki
(以内容“== AM development == === Sparse DNN === * Optimal Brain Damage(OBD). # Online OBD held. # OBD + L1 norm start to investigation. * Efficient computing # Conducti...”创建新页面) |
(→Efficient DNN training) |
||
(相同用户的9个中间修订版本未显示) | |||
第15行: | 第15行: | ||
=== Efficient DNN training === | === Efficient DNN training === | ||
− | # Moment-based training. With m=0.2 performs the best on WER. | + | # Moment-based training. With m=0.2 performs the best on WER. Results are not so consistent. For 1900, m=0.2 is the best; For online1 and online2, m=0 is the best. |
− | # Asymmetric window: | + | # Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test. Overfitting? |
− | + | # Frame-skipping. Skipping 1 frame speeds up decoding in a consistent way while retaining the accuracy largely. Skipping more frames lead to unacceptable performance degradation. | |
+ | |||
+ | <pre> | ||
+ | mom0.05 mom0.1 mom0.3 mom0.4 mom0.5 mom0.6 mom0.8 fs_1 fs_2 fs_3 | ||
+ | ------------------------------------------------------------------------------------- | ||
+ | avg_time | 4500 4175 3380 3460 3448 3521 4212 3149 2692 2716 | ||
+ | RT | 1.52 1.44 1.12 1.14 1.14 1.16 1.38 1.04 0.90 0.92 | ||
+ | </pre> | ||
=== Optimal phoneset=== | === Optimal phoneset=== | ||
− | + | * Experiment 3 phone sets: Tencent, CSLT, PQ | |
− | + | * The CSLT, PQ sets are similar (initial-final), with minor difference on Ri. The Tencent set is of phones | |
+ | * Test on the same NN structure. | ||
+ | * CSLT and PQ obtain similar performance, and better than the Tencent set in most test cases | ||
+ | * On online1 and online2, the Tencent set is a little better. | ||
+ | * We therefore prefer a phoneset based on initial-finals. | ||
+ | |||
+ | <pre> | ||
+ | |||
+ | CSLT: | ||
+ | |||
+ | map 8: %WER 25.76 [ 3768 / 14628, 131 ins, 436 del, 3201 sub ] | ||
+ | |||
+ | 2044 7: %WER 22.63 [ 5259 / 23241, 396 ins, 615 del, 4248 sub ] | ||
+ | |||
+ | notetp3 7: %WER 15.76 [ 292 / 1853, 19 ins, 30 del, 243 sub ] | ||
+ | |||
+ | record1900 11: %WER 5.98 [ 711 / 11888, 37 ins, 270 del, 404 sub ] | ||
+ | |||
+ | general 7: %WER 36.21 [ 13622 / 37619, 543 ins, 1085 del, 11994 sub ] | ||
+ | |||
+ | online1 12: %WER 37.73 [ 10729 / 28433, 634 ins, 2229 del, 7866 sub ] | ||
+ | |||
+ | online2 13: %WER 28.95 [ 17112 / 59101, 1113 ins, 3015 del, 12984 sub ] | ||
+ | |||
+ | speedup 9: %WER 25.71 [ 1351 / 5255, 49 ins, 276 del, 1026 sub ] | ||
+ | |||
+ | |||
+ | |||
+ | PQ: | ||
+ | |||
+ | map 9: %WER 24.25 [ 3547 / 14628, 115 ins, 428 del, 3004 sub ] | ||
+ | |||
+ | 2044 8: %WER 22.80 [ 5300 / 23241, 425 ins, 665 del, 4210 sub ] | ||
+ | |||
+ | notetp3 9: %WER 16.73 [ 310 / 1853, 34 ins, 28 del, 248 sub ] | ||
+ | |||
+ | record1900 11: %WER 5.88 [ 699 / 11888, 54 ins, 257 del, 388 sub ] | ||
+ | |||
+ | general 8: %WER 36.80 [ 13844 / 37619, 636 ins, 1102 del, 12106 sub ] | ||
+ | |||
+ | online1 14: %WER 37.77 [ 10739 / 28433, 592 ins, 2401 del, 7746 sub ] | ||
+ | |||
+ | online2 13: %WER 28.65 [ 16932 / 59101, 1136 ins, 2965 del, 12831 sub ] | ||
+ | |||
+ | speedup 9: %WER 26.32 [ 1383 / 5255, 66 ins, 273 del, 1044 sub ] | ||
+ | |||
+ | |||
+ | |||
+ | Tencent: | ||
+ | |||
+ | map 8: %WER 25.83 [ 3778 / 14628, 157 ins, 486 del, 3135 sub ] | ||
+ | |||
+ | 2044 8: %WER 24.51 [ 5697 / 23241, 502 ins, 765 del, 4430 sub ] | ||
+ | |||
+ | notetp3 10: %WER 19.86 [ 368 / 1853, 36 ins, 45 del, 287 sub ] | ||
+ | |||
+ | record1900 12: %WER 7.96 [ 946 / 11888, 50 ins, 378 del, 518 sub ] | ||
+ | |||
+ | general 7: %WER 37.94 [ 14274 / 37619, 537 ins, 1270 del, 12467 sub ] | ||
+ | |||
+ | online1 12: %WER 36.36 [ 10337 / 28433, 495 ins, 2082 del, 7760 sub ] | ||
+ | |||
+ | online2 13: %WER 28.46 [ 16822 / 59101, 893 ins, 2940 del, 12989 sub ] | ||
+ | |||
+ | speedup 10: %WER 28.53 [ 1499 / 5255, 62 ins, 349 del, 1088 sub ] | ||
+ | |||
+ | </pre> | ||
===Engine optimization=== | ===Engine optimization=== | ||
第34行: | 第107行: | ||
===NN LM=== | ===NN LM=== | ||
− | * | + | * Collecting a bigger lexicon: 40k words related to music, 56k words from an official dictionary. |
− | * | + | * Working on NN LM based on word2vector. |
− | + | ||
− | + | ||
==Embedded development== | ==Embedded development== | ||
+ | * Narrow and deep small scale NN trained. Investigating some bugs. | ||
* Embedded stream mode on progress. | * Embedded stream mode on progress. | ||
− | + | * On-the-fly grammar compiler | |
+ | :* LG compile is fine | ||
+ | :* CLG compile is fine | ||
+ | :* HCLG compile is slow | ||
+ | :* Working on speed up method. | ||
==Speech QA== | ==Speech QA== | ||
− | * | + | |
− | + | * Use N-best to expand match in QA. Better performance were obtained. | |
− | :* | + | :* 1-best matches 96/121 |
− | + | :* 10-best matches 102/121 | |
− | + | ||
+ | * Use N-best to recover errors in entity check. Working on. | ||
+ | * Use Pinyin to recover errors in entity check. Future work. |
2013年12月27日 (五) 02:31的最后版本
目录
AM development
Sparse DNN
- Optimal Brain Damage(OBD).
- Online OBD held.
- OBD + L1 norm start to investigation.
- Efficient computing
- Conducting rearrangement the matrix structure and compose zero blocks by some smart approaches, leading to better computing speed.
Efficient DNN training
- Moment-based training. With m=0.2 performs the best on WER. Results are not so consistent. For 1900, m=0.2 is the best; For online1 and online2, m=0 is the best.
- Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test. Overfitting?
- Frame-skipping. Skipping 1 frame speeds up decoding in a consistent way while retaining the accuracy largely. Skipping more frames lead to unacceptable performance degradation.
mom0.05 mom0.1 mom0.3 mom0.4 mom0.5 mom0.6 mom0.8 fs_1 fs_2 fs_3 ------------------------------------------------------------------------------------- avg_time | 4500 4175 3380 3460 3448 3521 4212 3149 2692 2716 RT | 1.52 1.44 1.12 1.14 1.14 1.16 1.38 1.04 0.90 0.92
Optimal phoneset
- Experiment 3 phone sets: Tencent, CSLT, PQ
- The CSLT, PQ sets are similar (initial-final), with minor difference on Ri. The Tencent set is of phones
- Test on the same NN structure.
- CSLT and PQ obtain similar performance, and better than the Tencent set in most test cases
- On online1 and online2, the Tencent set is a little better.
- We therefore prefer a phoneset based on initial-finals.
CSLT: map 8: %WER 25.76 [ 3768 / 14628, 131 ins, 436 del, 3201 sub ] 2044 7: %WER 22.63 [ 5259 / 23241, 396 ins, 615 del, 4248 sub ] notetp3 7: %WER 15.76 [ 292 / 1853, 19 ins, 30 del, 243 sub ] record1900 11: %WER 5.98 [ 711 / 11888, 37 ins, 270 del, 404 sub ] general 7: %WER 36.21 [ 13622 / 37619, 543 ins, 1085 del, 11994 sub ] online1 12: %WER 37.73 [ 10729 / 28433, 634 ins, 2229 del, 7866 sub ] online2 13: %WER 28.95 [ 17112 / 59101, 1113 ins, 3015 del, 12984 sub ] speedup 9: %WER 25.71 [ 1351 / 5255, 49 ins, 276 del, 1026 sub ] PQ: map 9: %WER 24.25 [ 3547 / 14628, 115 ins, 428 del, 3004 sub ] 2044 8: %WER 22.80 [ 5300 / 23241, 425 ins, 665 del, 4210 sub ] notetp3 9: %WER 16.73 [ 310 / 1853, 34 ins, 28 del, 248 sub ] record1900 11: %WER 5.88 [ 699 / 11888, 54 ins, 257 del, 388 sub ] general 8: %WER 36.80 [ 13844 / 37619, 636 ins, 1102 del, 12106 sub ] online1 14: %WER 37.77 [ 10739 / 28433, 592 ins, 2401 del, 7746 sub ] online2 13: %WER 28.65 [ 16932 / 59101, 1136 ins, 2965 del, 12831 sub ] speedup 9: %WER 26.32 [ 1383 / 5255, 66 ins, 273 del, 1044 sub ] Tencent: map 8: %WER 25.83 [ 3778 / 14628, 157 ins, 486 del, 3135 sub ] 2044 8: %WER 24.51 [ 5697 / 23241, 502 ins, 765 del, 4430 sub ] notetp3 10: %WER 19.86 [ 368 / 1853, 36 ins, 45 del, 287 sub ] record1900 12: %WER 7.96 [ 946 / 11888, 50 ins, 378 del, 518 sub ] general 7: %WER 37.94 [ 14274 / 37619, 537 ins, 1270 del, 12467 sub ] online1 12: %WER 36.36 [ 10337 / 28433, 495 ins, 2082 del, 7760 sub ] online2 13: %WER 28.46 [ 16822 / 59101, 893 ins, 2940 del, 12989 sub ] speedup 10: %WER 28.53 [ 1499 / 5255, 62 ins, 349 del, 1088 sub ]
Engine optimization
- Investigating LOUDS FST. On progress.
LM development
NN LM
- Collecting a bigger lexicon: 40k words related to music, 56k words from an official dictionary.
- Working on NN LM based on word2vector.
Embedded development
- Narrow and deep small scale NN trained. Investigating some bugs.
- Embedded stream mode on progress.
- On-the-fly grammar compiler
- LG compile is fine
- CLG compile is fine
- HCLG compile is slow
- Working on speed up method.
Speech QA
- Use N-best to expand match in QA. Better performance were obtained.
- 1-best matches 96/121
- 10-best matches 102/121
- Use N-best to recover errors in entity check. Working on.
- Use Pinyin to recover errors in entity check. Future work.