2014-06-27
来自cslt Wiki
目录
Resoruce Building
Leftover questions
- Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test.
- Multi GPU training: Error encountered
- Multilanguage training
- Investigating LOUDS FST.
- CLG embedded decoder plus online compiler.
- DNN-GMM co-training
AM development
Sparse DNN
- GA-based block sparsity (++++++++)
Noise training
- Paper writing on going
GFbank
- Running into Sinovoice 8k 1400 + 100 mixture training.
- FBank/GFbank, stream/non-stream MPE completed:
Huawei disanpi BJ mobile 8k English data FBank non-stream (MPE4) 20.44% 22.28% 24.36% FBank stream (MPE1) 20.17% 22.50% 21.63% GFbank stream (MPE4) 20.69% 22.84% 24.45% GFbank non-stream (MPE) - - -
Multilingual ASR
HW 30h (HW TR LM not involved) HW30h (HW TR LM involved) FBank non-stream (MPE4) 22.23 21.38 Fbank stream (monolang) 21.64 20.72
Denoising & Farfield ASR
- correlation-based alignment is done. this is necessary since more the recording device may cause artificial delay.
- how about the output cmvn test?
- deliver the recording to /nfs/disk/perm/data/corpora/reverberant
Original model:
xEnt model: middle-field far-field dev93 74.79 96.68 eval92 63.42 94.75 MPE model: MPE adaptation: middle-field far-field dev93 63.71 94.84 eval92 52.67 90.45
VAD
- DNN-based VAD (7.49) showers much better performance than energy based VAD (45.74)
- 100 X n (n<=3) hidden units with 2 output units seem sufficient for VAD
- report forms
Scoring
- refine the model with AMIDA database. Local minimum observed.
- ivector-based speaker detection seems find, reach 96% with 100 speakers
Embedded decoder
AM: 600x4+800 xent9 model: bigLM 1e-9 -------------------------------------------------------------------- voc size | 150k 20k 10k 5k -------------------------------------------------------------------- graph size| 9.1M 7.2M 5.5M -------------------------------------------------------------------- Acc | 15.96 -------------------------------------------------------------------- RT: | -------------------------------------------------------------------- bigLM 1e-7 -------------------------------------------------------------------- voc size | 150k 20k 10k 5k -------------------------------------------------------------------- graph size| 111 78 61 44 -------------------------------------------------------------------- Acc | 19.94 23.35 25.92 29.35 -------------------------------------------------------------------- RT: | 1.69 1.06 1.07 0.98 -------------------------------------------------------------------- HCLG 1e-6 -------------------------------------------------------------------- voc size | 150k 20k 10k 5k -------------------------------------------------------------------- graph size| 98 49 34 24 -------------------------------------------------------------------- Acc | 22.49 25.51 27.71 30.71 -------------------------------------------------------------------- RT: | 0.89 0.70 0.68 0.64 -------------------------------------------------------------------- HCLG 1e-5 -------------------------------------------------------------------- voc size | 150k 20k 10k 5k -------------------------------------------------------------------- graph size| 21 6.9 5.5 4.1 -------------------------------------------------------------------- Acc | 26.60 29.14 31.02 33.37 -------------------------------------------------------------------- RT: | 0.68 0.61 0.58 0.56 --------------------------------------------------------------------
LM development
Domain specific LM
- Baiduzhidao + Weibeo extraction done with various thresholds
- Looks like the extracted text can improve to some extent, but the major change seems come from pre-pocessing.
- Check proportion of tags int HW 30 h data!!!
Word2Vector
W2V based doc classification
- Full Gaussian based doc vector
- represent each doc with a Gaussian distribution of the word vectors it involved.
- using k-nn to conduct classification
mean Eur Distance KL distance diagonal KL baseline (NB with mean) Acc (50dim) 81.84 79.65 - 69.7
- svm-based classification
mean Eur Distance KL distance diagonal KL LDA 2-class Acc (50dim) - - - - 8-class Acc (50dim) - - - -
Semantic word tree
- Version v2.0 released (filter with query log)
- Please deliver to /nfs/disk/perm/data/corpora/semanticTree (Xingchao)
- Version v3.0 under going. Further refinement with Baidu Baike hierarchy
NN LM
- Character-based NNLM (6700 chars, 7gram), 500M data training done.
- Inconsistent pattern in WER were found on Tenent test sets
- probably need to use another test set to do investigation.
- Investigate MS RNN LM training