2014-06-13

Resoruce Building

Release management has been started

Leftover questions

Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test.
Multi GPU training: Error encountered
Multilanguage training
Investigating LOUDS FST.
CLG embedded decoder plus online compiler.
DNN-GMM co-training

AM development

Sparse DNN

GA-based block sparsity (++++++)
Paper revision under going.

Noise training

Paper writing will be started this week

GFbank

Running into Sinovoice 8k 1400 + 100 mixture training.
GFbank 14 xEnt iteration completed:

                                  Huawei disanpi     BJ mobile   8k English data

FBank non-stream (17 iteration) 22.01% 26.63% 33.83% FBank non-stream (MPE1) 21.07% 22.91% 24.34% GFbank stream (18 iteration) 22.26%; 27.79% 35.10% GFbank non-stream (16 iteration) 22.45%; 27.25% 34.64%

Multilingual ASR

TAG-based modeling is ok with a smaller step-in factor.
Non-tag test should be conducted on both Baidu & micro blob data

Denoising & Farfield ASR

With artificial reverberant, 2 x 1200 seems a more appropriate configuration. However great randomness was seen.
With utterance-cmn, performance seems better than no-cmn, however, with global-cmn, the performance was strangely reduced.
Should experiment with single-layer network with more hidden units.
Record another set of far-field database. Pre-process done.
Need obtain a baseline result with the new middle-far mic.

VAD

DNN-based VAD (7.49) showers much better performance than energy based VAD (45.74)
Need to test small scale network (+)

600-800 network test (+)
100 X 4 + 2 network training (+)

Scoring

Collect more data with human scoring to train discriminative models

Embedded decoder

600 X 4+800 AM, beam9: 

        150k       20k     10k      5k 
WER     15.96       -       -       -
RT       X         0.94     -       -

LM development

Domain specific LM

Cross entropy filtering is not better than the key-based filtering.
Seems possible to reduce the PPL with the extra retrieved data source.

Word2Vector

Design network spider
Design semantic related word tree

First version based on pattern match done
Filter with query log
Further refinement with Baidu Baike hierarchy

NN LM

Character-based NNLM (6700 chars, 7gram), 500M data training done.

Inconsistent pattern in WER were found on Tenent test sets
probably need to use another test set to do investigation.

Investigate MS RNN LM training