“2014-04-04”版本间的差异
来自cslt Wiki
(以内容“==Resoruce Building== * Current text resource has been re-arranged and listed == Leftover questions== * Asymmetric window: Great improvement on training set(WER 34% to...”创建新页面) |
|||
第49行: | 第49行: | ||
* Found errors attributed to speaker2utterance | * Found errors attributed to speaker2utterance | ||
− | ===Denoising & Farfield ASR | + | ===Denoising & Farfield ASR=== |
* First round of recording failed | * First round of recording failed | ||
* Record farfield wave in next week | * Record farfield wave in next week |
2014年4月4日 (五) 01:59的最后版本
目录
Resoruce Building
- Current text resource has been re-arranged and listed
Leftover questions
- Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test. Overfitting?
- Multi GPU training: Error encountered
- Multilanguage training
- Investigating LOUDS FST.
- CLG embedded decoder plus online compiler.
- DNN-GMM co-training
AM development
Sparse DNN
- GA-based block sparsity
- Found a paper in 2000 with similar ideas.
- Try to get a student working on high performance computing to do the optimization
Noise training
- More experiments with no-noise
- More experiments with additional noise types
AMR compression re-training
- 1700h MPE adaptation
- iter1:
amr: %WER 13.40 [ 6398 / 47753, 252 ins, 829 del, 5317 sub ] wav: %WER 11.19 [ 5343 / 47753, 178 ins, 710 del, 4455 sub ]
- iter2:
amr: %WER 13.31 [ 6358 / 47753, 255 ins, 798 del, 5305 sub ] wav: %WER 11.33 [ 5409 / 47753, 180 ins, 732 del, 4497 sub ]
- iter3:
amr: %WER 13.25 [ 6326 / 47753, 230 ins, 823 del, 5273 sub ] wav: %WER 11.43 [ 5460 / 47753, 199 ins, 709 del, 4552 sub ]
- iter4:
amr: %WER 13.17 [ 6289 / 47753, 225 ins, 833 del, 5231 sub ] wav: %WER 11.44 [ 5461 / 47753, 200 ins, 693 del, 4568 sub ]
- iter5:
amr: %WER 13.17 [ 6291 / 47753, 254 ins, 769 del, 5268 sub ] wav: %WER 11.46 [ 5471 / 47753, 200 ins, 696 del, 4575 sub ]
GFbank
- Found errors attributed to speaker2utterance
Denoising & Farfield ASR
- First round of recording failed
- Record farfield wave in next week
VAD
- Source code prepared
- Prepare DNN pipeline
Word to Vector
- LDA baseline (sogou 1700*9 training set)
- Training done
- Training classifier
LM development
NN LM
- Character-based NNLM (6700 chars, 7gram), 500M data training done.
- boundary-involved char NNLM training done
- Word-boundary seems less important than char history
- Investigate MS RNN LM training
Pronunciation scoring
- 8k model delivered
- MLP-based scoring completed
QA
FST-based matching
- Char FST on investigation
- FST-based QA patent done
Speech QA
- Class LM QA
- excellent done
- investigated various stepping-in weights, found negative weights (-1) is effective for encourage entity recognition
- investigated performance reduction due to the preference on small words. Introduced a factor on L.fst to discourage short words.