2013-12-27

AM development

Sparse DNN

Optimal Brain Damage(OBD).

Online OBD held.
OBD + L1 norm start to investigation.

Efficient computing

Conducting rearrangement the matrix structure and compose zero blocks by some smart approaches, leading to better computing speed.

Efficient DNN training

Moment-based training. With m=0.2 performs the best on WER. Results are not so consistent. For 1900, m=0.2 is the best; For online1 and online2, m=0 is the best.
Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test. Overfitting?
Frame-skipping. Skipping 1 frame does not impact performance too much and speed up decoding in a consistent way. Skipping more frames lead to further performance degradation (no acceptable).

                      mom0.05  mom0.1  mom0.3  mom0.4  mom0.5  mom0.6  mom0.8  fs_1  fs_2  fs_3
            -------------------------------------------------------------------------------------
            avg_time | 4500     4175    3380    3460    3448    3521    4212    3149   2692  2716
                RT   | 1.52     1.44    1.12    1.14    1.14    1.16    1.38    1.04   0.90  0.92

Optimal phoneset

Experiment 3 phone sets: Tencent, CSLT, PQ
The CSLT, PQ sets are similar (initial-final), with minor difference on Ri. The Tencent set is of phones
Test on the same NN structure.
CSLT and PQ obtain similar performance, and better than the Tencent set in most test cases
On online1 and online2, the Tencent set is a little better.
We therefore prefer a phoneset based on initial-finals.

 

CSLT：

map 8: %WER 25.76 [ 3768 / 14628, 131 ins, 436 del, 3201 sub ]

2044 7: %WER 22.63 [ 5259 / 23241, 396 ins, 615 del, 4248 sub ]

notetp3 7: %WER 15.76 [ 292 / 1853, 19 ins, 30 del, 243 sub ]

record1900 11: %WER 5.98 [ 711 / 11888, 37 ins, 270 del, 404 sub ]

general 7: %WER 36.21 [ 13622 / 37619, 543 ins, 1085 del, 11994 sub ]

online1 12: %WER 37.73 [ 10729 / 28433, 634 ins, 2229 del, 7866 sub ]

online2 13: %WER 28.95 [ 17112 / 59101, 1113 ins, 3015 del, 12984 sub ]

speedup 9: %WER 25.71 [ 1351 / 5255, 49 ins, 276 del, 1026 sub ]

 

PQ：

map 9: %WER 24.25 [ 3547 / 14628, 115 ins, 428 del, 3004 sub ]

2044 8: %WER 22.80 [ 5300 / 23241, 425 ins, 665 del, 4210 sub ]

notetp3 9: %WER 16.73 [ 310 / 1853, 34 ins, 28 del, 248 sub ]

record1900 11: %WER 5.88 [ 699 / 11888, 54 ins, 257 del, 388 sub ]

general 8: %WER 36.80 [ 13844 / 37619, 636 ins, 1102 del, 12106 sub ]

online1 14: %WER 37.77 [ 10739 / 28433, 592 ins, 2401 del, 7746 sub ]

online2 13: %WER 28.65 [ 16932 / 59101, 1136 ins, 2965 del, 12831 sub ]

speedup 9: %WER 26.32 [ 1383 / 5255, 66 ins, 273 del, 1044 sub ]

 

Tencent：

map 8: %WER 25.83 [ 3778 / 14628, 157 ins, 486 del, 3135 sub ]

2044 8: %WER 24.51 [ 5697 / 23241, 502 ins, 765 del, 4430 sub ]

notetp3 10: %WER 19.86 [ 368 / 1853, 36 ins, 45 del, 287 sub ]

record1900 12: %WER 7.96 [ 946 / 11888, 50 ins, 378 del, 518 sub ]

general 7: %WER 37.94 [ 14274 / 37619, 537 ins, 1270 del, 12467 sub ]

online1 12: %WER 36.36 [ 10337 / 28433, 495 ins, 2082 del, 7760 sub ]

online2 13: %WER 28.46 [ 16822 / 59101, 893 ins, 2940 del, 12989 sub ]

speedup 10: %WER 28.53 [ 1499 / 5255, 62 ins, 349 del, 1088 sub ]

Engine optimization

Investigating LOUDS FST. On progress.

LM development

NN LM

Collecting a bigger lexicon: 40k words related to music, 56k words from an official dictionary.
Working on NN LM based on word2vector.

Embedded development

Narrow and deep small scale NN trained. Investigating some bugs.
Embedded stream mode on progress.
On-the-fly grammar compiler

LG compile is fine
CLG compile is fine
HCLG compile is slow
Working on speed up method.

Speech QA

Use N-best to expand match in QA.

1-best matches 96/121
10-best matches 102/121

Use N-best to recover errors in entity check
Use Pinyin to recover errors in entity check

2013-12-27

目录

AM development

Sparse DNN

Efficient DNN training

Optimal phoneset

Engine optimization

LM development

NN LM

Embedded development

Speech QA

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具