2013-05-24
目录
Data sharing
- LM count files still undelivered!
DNN progress
Experiments
- sparse DNN
zero small values(WER 1900):
threshold 0 0.01 0.03 0.05 0.08 0.1 shrinkage 0.0 4.6% 13.5% 21.8% 33.4% 40.5% performance 7.25% 7.21% 7.28% 7.41% 7.61% 7.67%
- fixed-point DNN
ORG WER(1900) 7.25%
val=-math.log(abs(vv)/1000.0)*20
WER(1900): 7.30%
- fixed-pint HCLG
ORG WER(1900) 7.25%
INT 50 WER(1900) 7.30%
INT 10 WER(1900) 7.12%
Tencent exps
1:1000小时训练DNN模型,同时跑2个有关学习率的实验。一个learning rate指数下降,一个采用newbob的方式。实验接近尾声,下周前可以全部结束实验。对比效果后,采用较好的学习率递减方式,训练更大规模数据的dnn模型。
we are looking forward to the 1000 hour results..
2:解码器端尝试了sse,定点化等加速优化策略,仍不能再高并发的环境下,将实时率降到1以下,直接在测试端采用low-rank matrix approximations,测试性能衰减较多。训练段使用这种方法,公式有待推导。
we probably need to rely on the sparse net solution plus fix point computing.
待验证工作: 1:pretrain的2种策略:rbm和discriminative pretrain方法。
2:hmm-dnn训练之后,使用hmm-dnn模型alignment,更新转移概率之后,重新训练hmm-dnn性能。
3:hmm-dnn+sequential dt训练性能提升比例。
4:dnn训练端采用low-rank的方式。
(the low-rank is a bit strange to me, it does not related to a reason objective function directly, and the structure of the weight matrix is nothing to do with the objective.)
GPU & CPU merge
- just started
Kaldi/HTK merge
- HTK2Kaldi: hold.
- Kaldi2HTK: pdf error problem.
Kaldi Monophone: 30.91% HDecode: 41.40%
- workaround; use the BN feature to train HTK models, so without kaldi training.
Embedded progress
- Status:
- first embedded demo done, 1000 words take 3.2M memory.
- accuracy test finished. The test data involves 3 speakers recorded in a car with Chongqing dialect, 1000 address names.
- training acoustic model for sphinx. The an4 training process is done, while the test seems problematic.
Test Set | #utt | ERR | RT |
---|---|---|---|
806 | 23.33 | 0.07 | |
887 | 13.64 | 0.08 | |
876 | 17.58 | 0.07 |
- To be done
- finish the large scale AM training