“Sinovoice-2014-03-11”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
(以内容“=DNN training= ==Environment setting== * Raid212/Raid215/Disk212 done ==Corpora== * PICC data are under labeling (200h) done. * Now totally 1121h (470 + 346 + 105BJ...”创建新页面)
 
Training Analysis
 
(相同用户的9个中间修订版本未显示)
第1行: 第1行:
=DNN training=
+
=Environment setting=
 
+
==Environment setting==
+
  
 
* Raid212/Raid215/Disk212 done
 
* Raid212/Raid215/Disk212 done
  
==Corpora==
+
=Corpora=
  
 
* PICC data are under labeling (200h) done.
 
* PICC data are under labeling (200h) done.
 
* Now totally 1121h (470 + 346 + 105BJ mobile + 200 PICC) telephone speech is ready.
 
* Now totally 1121h (470 + 346 + 105BJ mobile + 200 PICC) telephone speech is ready.
* 16k 6000h data: 978h online data from DataTang + 656h online mobile data + 4300h recording data
+
* 16k 6000h data: 978h online data from DataTang + 656h online mobile data + 4300h recording data.
 +
* LM training text need be prepared in 2 days.
 +
 
 +
=Acoustic modeling=
  
 
==Telephone model training==
 
==Telephone model training==
  
===470 + 300h + BJ mobile 105h training===
+
===1000h Training===
 +
* Training recipe prepared
 +
* Expect to finish in 7 days
  
<pre>
 
Training condition                    NO NOISE        NOISE in LM      opt noise  NOISE LM + opt noise       
 
 
No noise:                                30.61%          -                    -            -
 
noise phone added:                      31.88%          30.76%              31.27%        31.07
 
</pre>
 
 
===BJ mobile incremental training===
 
 
(1) Original 470 + 300 model: 30.24% WER
 
<pre>
 
MPE2      MPE3        MPE3+iLM      MPE4+iLM
 
27.01%    26.72%      25.09%        24.53%
 
</pre>
 
  
 
===PICC dedicated training===
 
===PICC dedicated training===
第34行: 第23行:
 
<pre>
 
<pre>
 
Baseline (470+300h): 45.03
 
Baseline (470+300h): 45.03
+ PICC 105h incremental training (th=0.9): 41.89
+
+ PICC 188h incremental training (th=0.9): 41.89
+ PICC 105h incremental training (th=0.8): 41.64
+
+ PICC 188h incremental training (th=0.8): 41.64
+ PICC 105h labelled training: 34.78
+
+ PICC 188h labelled training: 34.78
+ PICC 105h labelled training + PICC text LM: 29.18
+
+ PICC 188h labelled training + PICC text LM: 29.18
 
</pre>
 
</pre>
 
  
 
==6000 hour 16k training==
 
==6000 hour 16k training==
第71行: 第59行:
 
* HTK training on the same database
 
* HTK training on the same database
 
:* HLDA:    18.22
 
:* HLDA:    18.22
:* HLDA+MPE: 14.40
+
:* HLDA+MPE: 17.40
 
+
  
 
===Hubei telecom===
 
===Hubei telecom===
第93行: 第80行:
 
</pre>
 
</pre>
  
 +
=Language modeling=
 +
* Need transfer the training text
  
 
=DNN Decoder=
 
=DNN Decoder=
第98行: 第87行:
  
 
==Online decoder==
 
==Online decoder==
* Various CMN implementation test
+
* CMN code delivered. Integration is done
:* 200ms/500ms frame block adaptation
+
* CMN pipe code delivered. Model adaptation is on going
:* 10ms frame block adaptation: totally wrong
+
 
+
{| class="wikitable"
+
|-
+
!prior weight !!  -1  !!    1  !!    5  !!      10  !!    20  !!    50  !!    100
+
|-
+
|200ms        ||  28.29  ||  37.53  || 35.50 ||  34.08 ||  32.90  || 32.30  || 32.77
+
|-
+
|500ms        ||  28.29  ||  31.28  || 30.83 ||  30.22 ||  29.50  || 29.32  || 29.36
+
|-
+
|}
+
 
+
* CMN code delivery
+
* Online model adaptation
+

2014年3月11日 (二) 06:42的最后版本

Environment setting

  • Raid212/Raid215/Disk212 done

Corpora

  • PICC data are under labeling (200h) done.
  • Now totally 1121h (470 + 346 + 105BJ mobile + 200 PICC) telephone speech is ready.
  • 16k 6000h data: 978h online data from DataTang + 656h online mobile data + 4300h recording data.
  • LM training text need be prepared in 2 days.

Acoustic modeling

Telephone model training

1000h Training

  • Training recipe prepared
  • Expect to finish in 7 days


PICC dedicated training

Baseline (470+300h): 45.03
+ PICC 188h incremental training (th=0.9): 41.89
+ PICC 188h incremental training (th=0.8): 41.64
+ PICC 188h labelled training: 34.78
+ PICC 188h labelled training + PICC text LM: 29.18

6000 hour 16k training

Training progress

  • Ran DNN MPE to iteration 5.
  • Receipe
  • 100h MPE training
  • 1700h MPE alignment/lattice
  • 1700h MPE training
  • 1 week to complete 3 MPE iterations
  • MPE2 result: 1e-9: 10.67% (8.61%), 1e-10: 10.34% (8.27%)
  • MPE3 result: 1e-9: 10.48% (8.43%), 1e-10: 10.12% (8.05%)
  • MPE4 result: 1e-9: 10.34% (8.31%), 1e-10: 10.03% (7.97%)
  • MPE5 result:

Training Analysis

  • Shared tree GMM model training completed, WER% is similar to non-shared model .
  • Selected 100h online data, trained two systems: (1) di-syllable system (2) jt-phone system
        di-syl      jt-ph
GMM:      -         20.86%
Xent    15.42%      14.78%       
MPE1    14.46%      14.23%
MPE2    14.22%      14.09%
MPE3    14.26%      13.80%
MPE4    14.24%      13.68%
  • HTK training on the same database
  • HLDA: 18.22
  • HLDA+MPE: 17.40

Hubei telecom

  • Hubei telecom data (127 h), retrieve 60k sentence by conf thred=0.9, amounting to 50%
xEnt org:  -             wer_15  29.05
MPE iter1:wer_14 29.23;wer_15 29.38
MPE iter2:wer_14 29.05;wer_15 29.11
MPE iter3:wer_14 29.32;wer_15 29.28
MPE iter4:wer_14 29.29;wer_15 29.28
  • retrieve 30k sentences by conf thred=0.95, amounting to 25%, plus the original 770h data
xEnt org:     -             wer_15  29.05
MPE iter1:    -             wer_15: 29.36

Language modeling

  • Need transfer the training text

DNN Decoder

Online decoder

  • CMN code delivered. Integration is done
  • CMN pipe code delivered. Model adaptation is on going