“2013-11-15”版本间的差异

2013年11月18日 (一) 06:48的最后版本

Data sharing

LM count files still undelivered!

AM development

Sparse DNN

Optimal Brain Damage(OBD).

Basic OBD done, with the ICASSP paper submitted.
Online OBD running

Try 3 configurations: batch size=256, 13000 (10 prunings), whole data.
The current results show that the the performance follows the order: Acc(whole data) > Acc(256) > Acc(13000).
Investigate some in-the-middle update, e.g., update twice for each iteration.

Noisy training

An ICASSP paper submitted.

Simulated Annealing training.

Rejected with small noises.
Using just the clean speech, it still rejected. This a bit strange.

Noise concentrated training

Using pure noise (no silence, narrow SNR band). Most of the results are expected.
Need to check the case with car-noise 20/25 db training and white noise 20 db test.

Noise-adding modification

Need to re-implement the noise-adding. Make it before the fbank computation.

Tencent exps

N/A

LM development

NN LM

Results show better performance with NN rescoring.

                  2044      map    notetp3   record1900  general  online1  online2 speedup
scal=  0.5	28.69	34.52	20.56	   14.53	 45.52	41.3	34.48	33.53
scal = 0.6	28.3	34.28	20.67	   14.05	 45.34	40.73	33.81	32.71
scal = 0.7	27.84	33.81	20.18	   13.74	 45.13	40.29	33.17	31.86
scal = 0.8	27.58	33.87	19.16	   13.53	 44.92	  40	32.82	31.74
scal = 0.9	27.86	33.92	19.05	   13.41	 44.9	39.65	32.5	31.89
scal = 0.95	27.79	34.07	19.05	   13.56	 44.83	39.76	32.41	31.68
scal = 0.96	27.9	34.1	18.83	   13.53	 44.83	39.79	32.43	31.68
scal = 0.97	27.94	34.15	18.83	   13.47	 44.82	39.78	32.44	31.89
scal = 0.99	28.02	34.2	19	   13.49	 44.86	39.82	32.47	32.01

QA LM

The QA model training done. Test on the Sogou Q text.

Data	lexicon	size	size2	PPL	PPL2
Q (10G)	15w	1.5G	800M	301.64	317.19
QA(100G)	11w	4.5G	1G	287.134	315.695
QA(100G)	8w8	4.5G	1G	559.029	626.146

@@ 第9行： / 第9行： @@
 * Optimal Brain Damage(OBD).
-* Online OBD.
+# Basic OBD done, with the ICASSP paper submitted.
+# Online OBD running
-* Try 3 configurations: batch size=256, 13000 (10 prunings), whole data. The current results show that the the performance order is: whole data > 256 > 13000.
+:* Try 3 configurations: batch size=256, 13000 (10 prunings), whole data.
+:* The current results show that the the performance follows the order: Acc(whole data) > Acc(256) > Acc(13000).
+:* Investigate some in-the-middle update, e.g., update twice for each iteration.
 ===Noisy training ===
+* An ICASSP paper submitted.
 * Simulated Annealing training.
-* Rejected with small noises. With clean training rejected after annealing.
+:* Rejected with small noises.
+:* Using just the clean speech, it still rejected. This a bit strange.
 * Noise concentrated training
+:* Using pure noise (no silence, narrow SNR band). Most of the results are expected.
+:* Need to check the case with  car-noise 20/25 db training and white noise 20 db test.
+* Noise-adding modification
+:* Need to re-implement the noise-adding. Make it before the fbank computation.
 === Tencent exps ===
@@ 第30行： / 第37行： @@
 ==LM development==
-===NN LM ===
+===NN LM===
+#  Results show better performance with NN rescoring.
+<pre>
+      map    notetp3   record1900  general  online1  online2 speedup
+scal=  0.5	28.69	34.52	20.56	   14.53	 45.52	41.3	34.48	33.53
+scal = 0.6	28.3	34.28	20.67	   14.05	 45.34	40.73	33.81	32.71
+scal = 0.7	27.84	33.81	20.18	   13.74	 45.13	40.29	33.17	31.86
+scal = 0.8	27.58	33.87	19.16	   13.53	 44.92	  40	32.82	31.74
+scal = 0.9	27.86	33.92	19.05	   13.41	 44.9	39.65	32.5	31.89
+scal = 0.95	27.79	34.07	19.05	   13.56	 44.83	39.76	32.41	31.68
+scal = 0.96	27.9	34.1	18.83	   13.53	 44.83	39.79	32.43	31.68
+scal = 0.97	27.94	34.15	18.83	   13.47	 44.82	39.78	32.44	31.89
+scal = 0.99	28.02	34.2	19	   13.49	 44.86	39.82	32.47	32.01
+</pre>
+===QA LM ===
-===QA LM===
+The QA model training done.  Test on the Sogou Q text.
-#  Tencent word segmentation system ready.
+{| class="wikitable"
-#  Collecting data for Q-LM training.
+! Data !! lexicon !! size !! size2 !! PPL !! PPL2
+|-
+|Q (10G)||15w  ||1.5G ||800M|| 301.64  || 317.19
+|-
+|QA(100G)||11w ||4.5G ||1G  || 287.134 || 315.695
+|-
+|QA(100G)||8w8 ||4.5G ||1G  || 559.029 || 626.146
+|-
+|}

“2013-11-15”版本间的差异

2013年11月18日 (一) 06:48的最后版本

目录

Data sharing

AM development

Sparse DNN

Noisy training

Tencent exps

LM development

NN LM

QA LM

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具