2013-10-25

Data sharing

LM count files still undelivered!

AM development

Sparse DNN

Optimal Brain Damage(OBD).

The initial observation shows that, after direct/obd cutting, obd is better than direct cutting in terms of training and testing accuracy.
The sticky training goes.

Noisy training

The 863 clean test shows consistent performance.
Investigation shows that the car noise corrupted the speech patterns more seriously with the current fb-based noise adding. This leads to worse performance with car noises than with white noises. Note this observation is valid only with the fb-based noise adding

Tencent exps

N/A

LM development

NN LM

The lattice rescore toolkit is done. The efficiency problem is solved by DP.
The perplexity of 10-NN is worse than 1-NN
Initial lattices rescoring shows that the rescoring needs remove the old ngram score and then insert the new score. The 1st nn works better than the 10-nn.
The CSLM does not improve the performance so far. This is conflict to the ppl results.
Investigate better ways to combine multiple NNs. Investigate rescoring n-best results instead of lattices.

lattice-rescore: traing-data:500M,dict:11w,test:tencent

id        cslm(lopp) cslm(-logp)   n-gram  n-gram(replace) cslm(-old+cslm) cslm((0-1024)) cslm((0-1024),(1024-2048))
map:        53.86         38.44      34.40    37.38          35.38                34.94      34.91
2044:       47.81         32.20      27.18    31.18          28.53                27.65      27.87
notetp3:    43.01         24.61      18.78    22.34          20.18                19.64      19.81
record1900: 32.14         20.42      12.89    20.10          13.97                13.52      13.81
general:    62.86         49.17      43.23    47.50          45.55                44.28      44.62
online1:    62.79         46.76      39.49    46.40          40.26                39.91      40.01       
online2:    58.20         39.21      32.15    38.75          33.04                32.61      32.65
speedup:    52.50         38.74      31.47    37.72          34.01                32.65      32.81

note: cslm(logp) --replace lattice value1 with cslm lnp
      cslm(-logp)--replace lattice value1 with cslm -lnp
      ngram(replace)--replace lattice vaule1 with ngram -lnp
      cslm((-old+cslm)--repace lattice value1 = -old_ngram+cslm 
      cslm((0-1024))--replace lattice value1=-old_ngram+cslm (slist:0-1024,mach_num=1)
      cslm((0-1024),(1024-2048))--replace lattice value1=-old_ngram+cslm (slist:0-1024,1024-1048,mach_num=2)

QA LM

Tencent word segmentation system ready.
Collecting data for Q-LM training.

2013-10-25

目录

Data sharing

AM development

Sparse DNN

Noisy training

Tencent exps

LM development

NN LM

QA LM

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具