2013-05-24

Data sharing

LM count files still undelivered!

DNN progress

Experiments

sparse DNN

zero small values(WER 1900):

threshold 0 0.01 0.03 0.05 0.08 0.1 shrinkage 0.0 4.6% 13.5% 21.8% 33.4% 40.5% performance 7.25% 7.21% 7.28% 7.41% 7.61% 7.67%

fixed-point DNN

ORG WER(1900) 7.25%

val=-math.log(abs(vv)/1000.0)*20

WER(1900): 7.30%

fixed-pint HCLG

ORG WER(1900) 7.25%

INT 50 WER(1900) 7.30%

INT 10 WER(1900) 7.12%

Tencent exps

1：1000小时训练DNN模型，同时跑2个有关学习率的实验。一个learning rate指数下降，一个采用newbob的方式。实验接近尾声，下周前可以全部结束实验。对比效果后，采用较好的学习率递减方式，训练更大规模数据的dnn模型。

we are looking forward to the 1000 hour results..

2：解码器端尝试了sse，定点化等加速优化策略，仍不能再高并发的环境下，将实时率降到1以下，直接在测试端采用low-rank matrix approximations，测试性能衰减较多。训练段使用这种方法，公式有待推导。

we probably need to rely on the sparse net solution plus fix point computing.

待验证工作： 1：pretrain的2种策略：rbm和discriminative pretrain方法。

2：hmm-dnn训练之后，使用hmm-dnn模型alignment,更新转移概率之后，重新训练hmm-dnn性能。

should be promising

3：hmm-dnn+sequential dt训练性能提升比例。

4：dnn训练端采用low-rank的方式。

(the low-rank is a bit strange to me, it does not related to a reason objective function directly, and the structure of the weight matrix is nothing to do with the objective.)

GPU & CPU merge

just started

Kaldi/HTK merge

HTK2Kaldi: hold.
Kaldi2HTK: pdf error problem.

Kaldi Monophone: 30.91%  HDecode: 41.40%

workaround; use the BN feature to train HTK models, so without kaldi training.

Embedded progress

Status:

first embedded demo done, 1000 words take 3.2M memory.
accuracy test finished. The test data involves 3 speakers recorded in a car with Chongqing dialect, 1000 address names.
training acoustic model for sphinx. The an4 training process is done, while the test seems problematic.

#utt	ERR	RT
806	23.33	0.07
887	13.64	0.08
876	17.58	0.07

To be done

finish the large scale AM training