Sinovoice-2014-02-17

来自cslt Wiki

跳转至：导航、搜索

目录

1 DNN training
2 DNN Decoder

DNN training

Environment setting

2nd GPU machine is ready. 3T * 4 RAID-0 is fast enough.
The new machine has been added into the SGE env.

Corpora

Beijing mobile 120h speech data are ready.
PICC data are under labeling (200h), ready in two weeks.
Now totally 1100h telephone speech will be ready soon.

470 hour 8k training

470 + 300h + Beijing mobile 120h training

Re-train the whole models including gmm+dnn, with noise model involved.
Train noise model by treating noise as a special phone
The noise should be treated specifically in L construction
7.2h per iteration, the xEnt training might be finished in 1 week
Run incremental DT training on the CSLT cluster, by mapping noise to the silence phone.

6000 hour 16k training

Ran CE DNN to iteration 8 (8400 states, 80000 pdf)
Testing results go down to 12.69% WER (Sinovoice results: 10.70).

Model	WER	RT
small LM, it 4, -5/-9	15.80	1.18
large LM, it 4, -5/-9	15.30	1.50
large LM, it 4, -6/-9	15.36	1.30
large LM, it 4, -7/-9	15.25	1.30
large LM, it 5, -5/-9	14.17	1.10
large LM, it 5, -5/-10	13.77	1.29
large LM, it 6, -5/-9	13.64	-
large LM, it 6, -5/-10	13.25	-
large LM, it 7, -5/-9	13.29	-
large LM, it 7, -5/-10	12.87	-
large LM, it 8, -5/-9	13.09	-
large LM, it 8, -5/-10	12.69	-

A new round of training with shared trees for tone variations has been kicked off and run into dnn training again.
Need to test the new gmm model, need to compare to Xiaoming's original settings

Adaptation

Adaptation with 10, 20, 30 sentences are conducted
30 sentences can reach reasonable performance (from 14.6 to 11.2).
Hidden layer adaptation is better than input and output adaptation
Cross entropy regularization with P=0.3 is reasonable
Results are here

Auto Transcription

PICC development set decoding obtained 45% WER.
PICC training set decoding done (200h), confidence generated
Set threshold=0.9, reduce the training data from 230k sentences to 40k.
Do discriminative training with the filtered 40k sentences and test on the development set

DNN Decoder

Faster decoder

The new RT is reported here
The RT of the latest decoder on train203 is 0.144 (HCLG) 0.148 (CLG).

Online decoder

Interface design completed
CMN strategy is clear: (1) global CMN model be trained first (2) Apply the model in decoding directly (3) may need to adapt the DNN model slightly with the feature.

取自“http://cslt.org/mediawiki/index.php?title=Sinovoice-2014-02-17&oldid=9180”