2014-06-20
来自cslt Wiki
目录
Resoruce Building
- release management combing done.
Leftover questions
- Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test.
- Multi GPU training: Error encountered
- Multilanguage training
- Investigating LOUDS FST.
- CLG embedded decoder plus online compiler.
- DNN-GMM co-training
AM development
Sparse DNN
- GA-based block sparsity (+++++++)
- Paper revision done.
Noise training
- Paper writing will be started this week
GFbank
- Running into Sinovoice 8k 1400 + 100 mixture training.
- GFbank 14 xEnt iteration completed:
Huawei disanpi BJ mobile 8k English data
FBank non-stream (MPE4) 20.44% 22.28% 24.36% GFbank stream (MPE4) - - - GFbank non-stream (MPE) - - -
Multilingual ASR
HW 30h (HW TR LM not involved) HW30h (HW TR LM involved) FBank non-stream (MPE4) 22.23 21.38 Fbank stream (monolang) 21.64 20.72 GFbank stream (MPE4) - - - GFbank non-stream (MPE) - - -
Denoising & Farfield ASR
- Replay may cause time delay. This should be solved by cross-correlation detection.
- Single-layer network with more hidden units. failed.
- Looks like the problem resides in large magnitude on output data.
- New recordings (one almost near mic & one far field 2 meters)
Original model:
xEnt model: middle-field far-field dev93 74.79 96.68 eval92 63.42 94.75 MPE model: MPE adaptation: middle-field far-field dev93 63.71 94.84 eval92 52.67 90.45
VAD
- DNN-based VAD (7.49) showers much better performance than energy based VAD (45.74)
- 100 X n (n<=3) hidden units with 2 output units seem sufficient for VAD
Scoring
- Collect more data with human scoring to train discriminative models
Embedded decoder
FSA size: threshold 1e-5 1e-6 1e-7 1e-8 1e-9 5k 480k 5.5M 44M - 1.1G 10k 731k 7M 61M 20k 1.2M 8.8M 78M(301M)
600 X 4+800 AM, beam9: 150k 20k 10k 5k WER 15.96 - - - RT X 0.94 - -
LM development
Domain specific LM
- Baiduzhidao + Weibeo extraction done with various thresholds
- Looks like the extracted text can improve to some extent, but the major change seems come from pre-pocessing.
- Check proportion of tags int HW 30 h data
Word2Vector
W2V based doc classification
- Full Gaussian based doc vector
- represent each doc with a Gaussian distribution of the word vectors it involved.
- using k-nn to conduct classification
mean Eur Distance KL distance baseline (NB with mean) Acc (50dim) 81.84 79.65 69.7
Semantic word tree
- First version based on pattern match done
- Filter with query log
- Further refinement with Baidu Baike hierarchy
NN LM
- Character-based NNLM (6700 chars, 7gram), 500M data training done.
- Inconsistent pattern in WER were found on Tenent test sets
- probably need to use another test set to do investigation.
- Investigate MS RNN LM training