“2013-10-25”版本间的差异

2013年10月25日 (五) 01:45的版本

Data sharing

LM count files still undelivered!

DNN progress

Sparse DNN

Optimal Brain Damage(OBD).

The initial observation shows that, after direct/obd cutting, obd is better than direct cutting in terms of training and testing accuracy.
The sticky training goes.

Tencent exps

N/A

Noisy training

The 863 clean test shows consistent performance.
Investigation shows that the car noise corrupted the speech patterns more seriously with the current fb-based noise adding. This leads to worse performance with car noises than with white noises. Note this observation is valid only with the fb-based noise adding

Continuous LM

The lattice rescore toolkit is done. The efficiency problem is solved by DP.
The perplexity of 10-NN is worse than 1-NN
Initial lattices rescoring shows that the rescoring needs remove the old ngram score and then insert the new score. The 1st nn works better than the 10-nn.
The CSLM does not improve the performance so far. This is conflict to the ppl results.
Investigate better ways to combine multiple NNs. Investigate rescoring n-best results instead of lattices.

lattice-rescore: traing-data:500M,dict:11w,test:tencent

id        cslm(lopp) cslm(-logp)   n-gram  n-gram(replace) cslm(-old+cslm) cslm((0-1024)) cslm((0-1024),(1024-2048))
map:        53.86         38.44      34.40    37.38          35.38                34.94      34.91
2044:       47.81         32.20      27.18    31.18          28.53                27.65      27.87
notetp3:    43.01         24.61      18.78    22.34          20.18                19.64      19.81
record1900: 32.14         20.42      12.89    20.10          13.97                13.52      13.81
general:    62.86         49.17      43.23    47.50          45.55                44.28      44.62
online1:    62.79         46.76      39.49    46.40          40.26                39.91      40.01       
online2:    58.20         39.21      32.15    38.75          33.04                32.61      32.65
speedup:    52.50         38.74      31.47    37.72          34.01                32.65      32.81

note: cslm(logp) --replace lattice value1 with cslm lnp
      cslm(-logp)--replace lattice value1 with cslm -lnp
      ngram(replace)--replace lattice vaule1 with ngram -lnp
      cslm((-old+cslm)--repace lattice value1 = -old_ngram+cslm 
      cslm((0-1024))--replace lattice value1=-old_ngram+cslm (slist:0-1024,mach_num=1)
      cslm((0-1024),(1024-2048))--replace lattice value1=-old_ngram+cslm (slist:0-1024,1024-1048,mach_num=2)

QA LM

Tencent word segmentation system ready.
Collecting data for Q-LM training.

“2013-10-25”版本间的差异

2013年10月25日 (五) 01:45的版本

目录

Data sharing

DNN progress

Sparse DNN

Tencent exps

Noisy training

Continuous LM

QA LM

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具