2013-10-25
来自cslt Wiki
目录
Data sharing
- LM count files still undelivered!
AM development
Sparse DNN
- Optimal Brain Damage(OBD).
- The initial observation shows that, after direct/obd cutting, obd is better than direct cutting in terms of training and testing accuracy.
- The sticky training goes.
Noisy training
- The 863 clean test shows consistent performance.
- Investigation shows that the car noise corrupted the speech patterns more seriously with the current fb-based noise adding. This leads to worse performance with car noises than with white noises. Note this observation is valid only with the fb-based noise adding
Tencent exps
N/A
LM development
NN LM
- The lattice rescore toolkit is done. The efficiency problem is solved by DP.
- The perplexity of 10-NN is worse than 1-NN
- Initial lattices rescoring shows that the rescoring needs remove the old ngram score and then insert the new score. The 1st nn works better than the 10-nn.
- The CSLM does not improve the performance so far. This is conflict to the ppl results.
- Investigate better ways to combine multiple NNs. Investigate rescoring n-best results instead of lattices.
lattice-rescore: traing-data:500M,dict:11w,test:tencent id cslm(lopp) cslm(-logp) n-gram n-gram(replace) cslm(-old+cslm) cslm((0-1024)) cslm((0-1024),(1024-2048)) map: 53.86 38.44 34.40 37.38 35.38 34.94 34.91 2044: 47.81 32.20 27.18 31.18 28.53 27.65 27.87 notetp3: 43.01 24.61 18.78 22.34 20.18 19.64 19.81 record1900: 32.14 20.42 12.89 20.10 13.97 13.52 13.81 general: 62.86 49.17 43.23 47.50 45.55 44.28 44.62 online1: 62.79 46.76 39.49 46.40 40.26 39.91 40.01 online2: 58.20 39.21 32.15 38.75 33.04 32.61 32.65 speedup: 52.50 38.74 31.47 37.72 34.01 32.65 32.81 note: cslm(logp) --replace lattice value1 with cslm lnp cslm(-logp)--replace lattice value1 with cslm -lnp ngram(replace)--replace lattice vaule1 with ngram -lnp cslm((-old+cslm)--repace lattice value1 = -old_ngram+cslm cslm((0-1024))--replace lattice value1=-old_ngram+cslm (slist:0-1024,mach_num=1) cslm((0-1024),(1024-2048))--replace lattice value1=-old_ngram+cslm (slist:0-1024,1024-1048,mach_num=2)
QA LM
- Tencent word segmentation system ready.
- Collecting data for Q-LM training.