“2014-04-25”版本间的差异
来自cslt Wiki
(以内容“==Resoruce Building== * Maxi onboard * Release management should be started: Zhiyong (+) * Blaster 0.1 & vivian 0.0 system release == Leftover questions== * Asymmetric...”创建新页面) |
(→Denoising & Farfield ASR) |
||
第42行: | 第42行: | ||
* Baseline: close-talk model decode far-field speech: 92.65 | * Baseline: close-talk model decode far-field speech: 92.65 | ||
− | * | + | * MPE1: 92.78 |
− | + | MPE2: 91.15 | |
− | + | MPE3: 91.21 | |
− | + | MPE4: 91.51 | |
* Will test the result on the dev set | * Will test the result on the dev set | ||
− | |||
===VAD=== | ===VAD=== |
2014年4月25日 (五) 02:12的最后版本
目录
Resoruce Building
- Maxi onboard
- Release management should be started: Zhiyong (+)
- Blaster 0.1 & vivian 0.0 system release
Leftover questions
- Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test. Overfitting?
- Multi GPU training: Error encountered
- Multilanguage training
- Investigating LOUDS FST.
- CLG embedded decoder plus online compiler.
- DNN-GMM co-training
AM development
Sparse DNN
- GA-based block sparsity (+)
- Found a paper in 2000 with similar ideas.
- Try to get a student working on high performance computing to do the optimization
Noise training
- More experiments with no-noise (+)
- More experiments with additional noise types (+)
AMR compression re-training
- Stream model deliver to wechat server (Mengyuan + Liuchao)
GFbank
- GFBank Sinovoice test on 1700 MPE (10.34-10.14)
- GFBank sinovoice 1700 MPE stream
Multilingual ASR
- all phone strategy baseline done
- Some strange behavior observed when fixing early-leyers click here
Denoising & Farfield ASR
- Baseline: close-talk model decode far-field speech: 92.65
- MPE1: 92.78
MPE2: 91.15 MPE3: 91.21 MPE4: 91.51
- Will test the result on the dev set
VAD
- VAD bug on smoothing approach was found.
Scoring
- A speaker identification system based on ivector was delivered
- Male/female identification based on UBM was delievered
- Phone-sequence based graph decoding was delivered
Word to Vector
- Dimension of low space varies from 10-100 done. Expand to 200 dimensions. Some strange behavior was found on w2v. Try on daily people data.
- Test multi-classification from 2-9. w2v done. Work on lda.
- Test on various w2v window-size n=3-15. Strange behavior at n=9. click here
LM development
NN LM
- Character-based NNLM (6700 chars, 7gram), 500M data training done.
- Overlfow found. Code change done. Run into 6 iterations.
- Investigate MS RNN LM training
QA
FST-based matching
- Word-based FST 1-2 seconds with 1600 patterns. Huilan's implementation <1 second.
- THRAX toolkit for grammar to FST
- Investigate determinization of G embedding
- Refer to Kaldi new code