“2013-10-25”版本间的差异

2013年10月25日 (五) 01:46的版本

Data sharing

LM count files still undelivered!

DNN progress

Sparse DNN

Optimal Brain Damage(OBD).

The initial observation shows that, after direct/obd cutting, obd is better than direct cutting in terms of training and testing accuracy.
The sticky training goes.

Noisy training

The 863 clean test shows consistent performance.
Investigation shows that the car noise corrupted the speech patterns more seriously with the current fb-based noise adding. This leads to worse performance with car noises than with white noises. Note this observation is valid only with the fb-based noise adding

Tencent exps

N/A

LM development

Continuous LM

The lattice rescore toolkit is done. The efficiency problem is solved by DP.
The perplexity of 10-NN is worse than 1-NN
Initial lattices rescoring shows that the rescoring needs remove the old ngram score and then insert the new score. The 1st nn works better than the 10-nn.
The CSLM does not improve the performance so far. This is conflict to the ppl results.
Investigate better ways to combine multiple NNs. Investigate rescoring n-best results instead of lattices.

lattice-rescore: traing-data:500M,dict:11w,test:tencent

id        cslm(lopp) cslm(-logp)   n-gram  n-gram(replace) cslm(-old+cslm) cslm((0-1024)) cslm((0-1024),(1024-2048))
map:        53.86         38.44      34.40    37.38          35.38                34.94      34.91
2044:       47.81         32.20      27.18    31.18          28.53                27.65      27.87
notetp3:    43.01         24.61      18.78    22.34          20.18                19.64      19.81
record1900: 32.14         20.42      12.89    20.10          13.97                13.52      13.81
general:    62.86         49.17      43.23    47.50          45.55                44.28      44.62
online1:    62.79         46.76      39.49    46.40          40.26                39.91      40.01       
online2:    58.20         39.21      32.15    38.75          33.04                32.61      32.65
speedup:    52.50         38.74      31.47    37.72          34.01                32.65      32.81

note: cslm(logp) --replace lattice value1 with cslm lnp
      cslm(-logp)--replace lattice value1 with cslm -lnp
      ngram(replace)--replace lattice vaule1 with ngram -lnp
      cslm((-old+cslm)--repace lattice value1 = -old_ngram+cslm 
      cslm((0-1024))--replace lattice value1=-old_ngram+cslm (slist:0-1024,mach_num=1)
      cslm((0-1024),(1024-2048))--replace lattice value1=-old_ngram+cslm (slist:0-1024,1024-1048,mach_num=2)

QA LM

Tencent word segmentation system ready.
Collecting data for Q-LM training.

@@ 第12行： / 第12行： @@
 # The sticky training goes.
+===Noisy training ===
+# The 863 clean test shows consistent performance.
+# Investigation shows that the car noise corrupted the speech patterns more seriously with the current fb-based noise adding. This leads to worse performance with car noises than with white noises. Note this observation is valid only with the fb-based noise adding
 === Tencent exps ===
@@ 第17行： / 第21行： @@
-==Noisy training ==
+==LM development==
+===Continuous LM ===
-# The 863 clean test shows consistent performance.
-# Investigation shows that the car noise corrupted the speech patterns more seriously with the current fb-based noise adding. This leads to worse performance with car noises than with white noises. Note this observation is valid only with the fb-based noise adding
-==Continuous LM ==
 # The lattice rescore toolkit is done. The efficiency problem is solved by DP.
@@ 第52行： / 第52行： @@
 </pre>
-==QA LM==
+===QA LM===
 #  Tencent word segmentation system ready.
 #  Collecting data for Q-LM training.

“2013-10-25”版本间的差异

2013年10月25日 (五) 01:46的版本

目录

Data sharing

DNN progress

Sparse DNN

Noisy training

Tencent exps

LM development

Continuous LM

QA LM

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具