2013-05-17
来自cslt Wiki
目录
Data sharing
- LM count files still undelivered!
DNN progress
Experiments
- setups for mfcc/plp
Test Set | fMMI | s1/tri1 | s2/tri1 | s3/tri1 | s4/tri1 | s2/tri2 | s4/tri2 | cpu-based (like s4/tri1) | plp-s2/tri2 |
---|---|---|---|---|---|---|---|---|---|
map | 28.58 | 25.38 | 24.47 | 26.16 | 26.20 | 22.85 | 24.27 | 26.45 | 23.86 |
2044 | 24.79 | 23.58 | 22.82 | 23.84 | 24.13 | 21.45 | 22.76 | 24.66 | 22.68 |
notetp3 | 21.64 | 16.08 | 14.89 | 15.92 | 15.97 | 14.89 | 14.79 | 16.14 | 16.46 |
1900 | 8.19 | 8.55 | 8.43 | 8.66 | 8.90 | 7.30 | 7.91 | 8.23 | 7.68 |
general | 39.63 | 36.18 | 34.79 | 35.88 | 35.90 | 33.06 | 33.79 | 38.02 | 34.12 |
online1 | 35.19 | 34.68 | 33.90 | 33.45 | 33.38 | 32.93 | 32.43 | 33.00 | 33.60 |
online2 | 28.30 | 27.27 | 26.61 | 26.26 | 26.36 | 25.94 | 25.69 | 26.63 | 26.20 |
speedup | 28.45 | 24.97 | 24.40 | 24.55 | 25.42 | 23.04 | 23.67 | 27.17 | 23.62 |
Tencent exps
- 手动将NN网络的W权重,较小的置零,在保留30%左右的较大权重的条件下,系统性能未见明显衰减(how much in number? it would be interesting to re-train the net after pruning the weights)。
- 按照HTK模型的结构,以及HTK align的结构,修改Kaldi的GPU接口,验证并无问题,已开始较大规模数据训练(1000小时),网络结构前后5帧扩展,4个隐层,每层2048节点,输出15000个状态。使用mpe模型alignment,特征为plp特征,未做任何映射。
- 解码器仍在CLG结构下,修改声学模型计算接口,接入DNN模型,验证无问题,已开始效率优化。
待做实验:
- 验证不同学习率调节策略,指数下降衰减方式,newbob方式。
- 验证不同特征在大数据上的作用。
- 最后层不过softmax,降维得到BN特征实验,类似IBM BN做法 (if this is a linear dim-reduction, it might be worse than the BN...)
GPU & CPU merge
- just started
Kaldi/HTK merge
- HTK2Kaldi: hold.
- Kaldi2HTK: pdf error problem.
Kaldi Monophone: 30.91% HDecode: 41.40%
- workaround; use the BN feature to train HTK models, so without kaldi training.
Embedded progress
- Status:
- first embedded demo done, 1000 words take 3.2M memory.
- accuracy test finished. The test data involves 3 speakers recorded in a car with Chongqing dialect, 1000 address names.
- training acoustic model for sphinx. The an4 training process is done, while the test seems problematic.
Test Set | #utt | ERR | RT |
---|---|---|---|
806 | 23.33 | 0.07 | |
887 | 13.64 | 0.08 | |
876 | 17.58 | 0.07 |
- To be done
- finish the large scale AM training