Environment setting

Raid212/Raid215/Disk212 done

Corpora

PICC data are under labeling (200h) done.
Now totally 1121h (470 + 346 + 105BJ mobile + 200 PICC) telephone speech is ready.
16k 6000h data: 978h online data from DataTang + 656h online mobile data + 4300h recording data

Acoustic modeling

Telephone model training

1000h Training

Training recipe prepared
Start to run.

BJ mobile incremental training

(1) Original 470 + 300 model: 30.24% WER

MPE2      MPE3         MPE3+iLM       MPE4+iLM
27.01%     26.72%       25.09%         24.53%

PICC dedicated training

Baseline (470+300h): 45.03
+ PICC 105h incremental training (th=0.9): 41.89
+ PICC 105h incremental training (th=0.8): 41.64
+ PICC 105h labelled training: 34.78
+ PICC 105h labelled training + PICC text LM: 29.18

6000 hour 16k training

Training progress

Ran DNN MPE to iteration 5.
Receipe

100h MPE training
1700h MPE alignment/lattice
1700h MPE training

1 week to complete 3 MPE iterations
MPE2 result: 1e-9: 10.67% (8.61%), 1e-10: 10.34% (8.27%)
MPE3 result: 1e-9: 10.48% (8.43%), 1e-10: 10.12% (8.05%)
MPE4 result: 1e-9: 10.34% (8.31%), 1e-10: 10.03% (7.97%)
MPE5 result:

Training Analysis

Shared tree GMM model training completed, WER% is similar to non-shared model .
Selected 100h online data, trained two systems: (1) di-syllable system (2) jt-phone system

        di-syl      jt-ph
GMM:      -         20.86%
Xent    15.42%      14.78%       
MPE1    14.46%      14.23%
MPE2    14.22%      14.09%
MPE3    14.26%      13.80%
MPE4    14.24%      13.68%

HTK training on the same database

HLDA: 18.22
HLDA+MPE: 14.40

Hubei telecom

Hubei telecom data (127 h), retrieve 60k sentence by conf thred=0.9, amounting to 50%

xEnt org:  -             wer_15  29.05
MPE iter1：wer_14 29.23；wer_15 29.38
MPE iter2：wer_14 29.05；wer_15 29.11
MPE iter3：wer_14 29.32；wer_15 29.28
MPE iter4：wer_14 29.29；wer_15 29.28

retrieve 30k sentences by conf thred=0.95, amounting to 25%, plus the original 770h data

xEnt org:     -             wer_15  29.05
MPE iter1:    -             wer_15: 29.36

Sinovoice-2014-03-11

目录

Environment setting

Corpora

Acoustic modeling

Telephone model training

1000h Training

BJ mobile incremental training

PICC dedicated training

6000 hour 16k training

Training progress

Training Analysis

Hubei telecom

Language modeling

DNN Decoder

Online decoder

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具