DNN training

Environment setting

Schedule cluster poweroff on 3/06, construct RAID-0 on train212
The new RAID-0 used three new disks on train212
Change nfs names: disk1 -> /nfs/disk212, the raid disks: /nfs/raid212, disk2->/nfs/raid215

Corpora

PICC data are under labeling (200h), ready in one week.
105h data from BJ mobile
127h Hubei telecom
Now totally 1121h (470 + 346 + 105 + 200) telephone speech will be ready soon.
16k 6000h data: 978h online data from DataTang + 656h online mobile data + 4300h recording data

Telephone model training

470 + 300h + BJ mobile 105h training

Training condition NO NOISE NOISE in LM opt noise NOISE LM + opt noise

No noise: 30.61% - - - noise phone added: 31.88% 30.76% 31.27% 31.07

BJ mobile incremental training

(1) Original 470 + 300 model: 30.24% WER

MPE2 MPE3 MPE3+iLM MPE4+iLM 27.01% 26.72% 25.09% 24.53%

6000 hour 16k training

Training progress

Ran CE DNN to iteration 11 (8400 states, 80000 pdf)
Testing results go down to 12.49% WER (Sinovoice results: 10.49).

Model	WER	RT
small LM, it 4, -5/-9	15.80	1.18
large LM, it 4, -5/-9	15.30	1.50
large LM, it 4, -6/-9	15.36	1.30
large LM, it 4, -7/-9	15.25	1.30
large LM, it 5, -5/-9	14.17	1.10
large LM, it 5, -5/-10	13.77	1.29
large LM, it 6, -5/-9	13.64	1.12
large LM, it 6, -5/-10	13.25	1.33
large LM, it 7, -5/-9	13.29	1.12
large LM, it 7, -5/-10	12.87	1.17
large LM, it 8, -5/-9	13.09	-
large LM, it 8, -5/-10	12.69	-
large LM, it 9, -5/-9	12.87	-
large LM, it 9, -5/-10	12.55	-
large LM, it 10, -5/-9	12.83	1.51
large LM, it 10, -5/-10	12.48	1.65
large LM, it 11, -5/-9	12.87	1.61
large LM, it 11, -5/-10	12.46	1.28
large LM, it 12, -5/-9	12.91	1.61
large LM, it 12, -5/-10	12.49	1.28

xEnt training is done

Training Analysis

Shared tree GMM model training completed, WER% is similar to non-shared model .
Selected 100h online data, trained two systems: (1) di-syllable system (2) jt-phone system

        di-syl      jt-ph
Xent    15.42%      14.78%       
MPE1    14.46%      14.23%
MPE2    14.22%      14.09%
MPE3    14.26%      13.80%
MPE4    14.24%      13.68%

Hybrid training

Receipe

100h MPE training
1700h MPE alignment/lattice
1700h MPE training

1 week to complete 3 MPE iterations
MPE2 result: 1e-9: 10.67% (8.61%), 1e-10: 10.34% (8.27%)

Auto Transcription

PICC

PICC development set decoding obtained 45% WER.
PICC auto-trans incremental DT training completed

Threshold  WER
org:     45.03%
0.9:     41.89%
0.8:     41.64%

Current training data with 0.8 involve 80k sentences, amounting to about 60h data.
Sampling 60h labelled data to enrich the training
Prepare to compare the unsupervised incremental training and supervised training

Hubei telecom

Hubei telecom data (127 h), retrieve 60k sentence by conf thred=0.9, amounting to 50%

xEnt org: - wer_15 29.05 MPE iter1：wer_14 29.23；wer_15 29.38 MPE iter2：wer_14 29.05；wer_15 29.11 MPE iter3：wer_14 29.32；wer_15 29.28 MPE iter4：wer_14 29.29；wer_15 29.28

retrieve 30k sentences by conf thred=0.95, amounting to 25%, plus the original 770h data

xEnt org: - wer_15 29.05 MPE iter1: - wer_15: 29.36

DNN Decoder

Online decoder

Various CMN implementation test

200ms/500ms frame block adaptation
10ms frame block adaptation: totally wrong

}

CMN code delivery
Online model adaptation

prior weight	-1	1	5	10	20	50	100		200ms	28.29	37.53	35.50	34.08	32.90	32.30	32.77
500ms	28.29	31.28	30.83	30.22	29.50	29.32	29.36

Sinovoice-2014-03-04

目录