2014-04-11
来自cslt Wiki
目录
Resoruce Building
- Current text resource has been re-arranged and listed
Leftover questions
- Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test. Overfitting?
- Multi GPU training: Error encountered
- Multilanguage training
- Investigating LOUDS FST.
- CLG embedded decoder plus online compiler.
- DNN-GMM co-training
AM development
Sparse DNN
- GA-based block sparsity
- Found a paper in 2000 with similar ideas.
- Try to get a student working on high performance computing to do the optimization
Noise training
- More experiments with no-noise
- More experiments with additional noise types
AMR compression re-training
- 1700h MPE adaptation done
- 1700h stream mode adaptation runs into MPE1
GFbank
- Significant improvement found with GFBank
- Significant improvement found with FBank + GFBank
Denoising & Farfield ASR
- Recording done
- Prepare to construct the baseline
VAD
- Code ready, need to figure out speech/no-speech smooth
Farfield recognition
Scoring
- g-score based on MLP is done
- t-score based on linear regression improves the performance
Word to Vector
- LDA baseline (sogou 1700*9 training set) done
- Wordvector classification is much better than the LDA system
word vector: general: dict - 15w; train_data - ren_ming_ri_bao(5g); windows-5 1. size - 50 time=30m 12thread 2. size - 10 time=10m 12thread data: class_num=9 document_num=9*2000 train_num =9*1600 test_num =9*200 dev_num =9*200 train_set: C000008 C000010 C000013 C000014 C000016 C000020 C000022 C000023 C000024 total 财经 IT 健康 体育 旅游 教育 招聘 文化 军事 lda_inf 0.845 0.2756 0.698 0.9502 0.63499 0.32 0.8080 0.3505 0.864 0.6385 lda_inf_10 0.8149 0.0887 0.628 0.9641 0.5739 0.105 0.707363 0.2334 0.8628 0.553167 w2v_filter_filer 0.7463 0.713 0.657 0.9106 0.68659 0.54 0.74638 0.692 0.84518 0.72638 w2v_filter_filer_10 0.7608 0.4323 0.57394 0.865 0.549 0.335 0.577 0.6129 0.78099 0.609769 test_set: C000008 C000010 C000013 C000014 C000016 C000020 C000022 C000023 C000024 total 财经 IT 健康 体育 旅游 教育 招聘 文化 军事 w2v_filter_filter 0.6865 0.7263 0.6716 0.84577 0.7462 0.46268 0.6567 0.7114 0.8905 0.71088 w2v_filter_filter_10 0.791 0.4079 0.56218 0.74129 0.62189 0.22885 0.562 0.6766 0.84079 0.603648 lda_inf 0.8706 0.26368 0.6965 0.8009 0.582 0.2537 0.72139 0.3184 0.82587 0.59259 lda_inf_10 0.776 0.1044 0.6467 0.9054 0.62189 0.1144 0.56218 0.24378 0.796 0.530127 note:w2v_filter--remove the stop word in traing word vector note:w2v_filter_filter -- remove the stop word in traing word vector and remove the documnet stop words
LM development
NN LM
- Character-based NNLM (6700 chars, 7gram), 500M data training done.
- Non-boundary char LM is better than boundary char LM
- Investigate MS RNN LM training
QA
FST-based matching
- Word-based FST 1-2 seconds with 1600 patterns. Huilan's implementation <1 second. ?????
- Char-FST Implementation is done.
Speech QA
- Investigate determinization of G embedding