“2014-05-16”版本间的差异
来自cslt Wiki
(以内容“==Resoruce Building== * Maxi onboard * Release management should be started: Zhiyong (+) * Blaster 0.1 & vivian 0.0 system release == Leftover questions== * Asymmetric...”创建新页面) |
|||
第20行: | 第20行: | ||
===Noise training=== | ===Noise training=== | ||
− | :* | + | :* More with-clean training completed. 2 conditions left |
− | + | ||
===GFbank=== | ===GFbank=== | ||
− | * GFBank sinovoice 1400 MPE stream | + | * 8k train |
− | * GFBank sinovoice 6000 | + | :* GFBank sinovoice 1400 MPE stream |
+ | |||
+ | * 16k train | ||
+ | :* GFBank sinovoice 6000 MPE1 stream: worse than 1700h (10.18-11.11) | ||
+ | |||
===Multilingual ASR=== | ===Multilingual ASR=== | ||
− | * MPE | + | * Test sharing scheme: |
− | + | :* decision tree share, xent improvement obtained, MPE no improvement (Chinese worse a bit, English a bit better). | |
− | + | ||
+ | |||
+ | ===English model=== | ||
+ | |||
+ | <pre> | ||
+ | mic tel | ||
+ | pure eng voxforge fisher | ||
+ | chinese eng shujutang convert-from-shujutang | ||
+ | </pre> | ||
===Denoising & Farfield ASR=== | ===Denoising & Farfield ASR=== | ||
第38行: | 第49行: | ||
* Baseline: close-talk model decode far-field speech: 92.65 | * Baseline: close-talk model decode far-field speech: 92.65 | ||
* Will investigate DAE model. | * Will investigate DAE model. | ||
+ | |||
+ | ===Kaiser Window === | ||
+ | <pre> | ||
+ | window function test based on 23 Mel channel number 8k wsj databas | ||
+ | window function %WER ins del sub | ||
+ | kaiser 278 / 5643=4.93 39 15 224 | ||
+ | povey 265 / 5643=4.70 34 14 217 | ||
+ | |||
+ | window function test based on 30 Mel channel number 8k wsj databas | ||
+ | window function %WER ins del sub | ||
+ | kaiser 270 / 5643= 4.78 38 17 215 | ||
+ | povey 283 / 5643= 5.02 36 24 223 | ||
+ | </pre> | ||
===VAD=== | ===VAD=== | ||
− | * VAD | + | * DNN-based VAD (24.77) shower better performance than energy based VAD (45.73) |
− | + | ||
===Scoring=== | ===Scoring=== | ||
第49行: | 第73行: | ||
==Word to Vector== | ==Word to Vector== | ||
− | * Paper | + | * Paper submitted |
==LM development== | ==LM development== | ||
+ | |||
+ | ===Domain specific LM=== | ||
+ | * Prepare English lexicon | ||
===NN LM=== | ===NN LM=== |
2014年5月16日 (五) 01:42的最后版本
Resoruce Building
- Maxi onboard
- Release management should be started: Zhiyong (+)
- Blaster 0.1 & vivian 0.0 system release
Leftover questions
- Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test. Overfitting?
- Multi GPU training: Error encountered
- Multilanguage training
- Investigating LOUDS FST.
- CLG embedded decoder plus online compiler.
- DNN-GMM co-training
AM development
Sparse DNN
- GA-based block sparsity (+++)
- Found a paper in 2000 with similar ideas.
- Try to get a student working on high performance computing to do the optimization
Noise training
- More with-clean training completed. 2 conditions left
GFbank
- 8k train
- GFBank sinovoice 1400 MPE stream
- 16k train
- GFBank sinovoice 6000 MPE1 stream: worse than 1700h (10.18-11.11)
Multilingual ASR
- Test sharing scheme:
- decision tree share, xent improvement obtained, MPE no improvement (Chinese worse a bit, English a bit better).
English model
mic tel pure eng voxforge fisher chinese eng shujutang convert-from-shujutang
Denoising & Farfield ASR
- Baseline: close-talk model decode far-field speech: 92.65
- Will investigate DAE model.
Kaiser Window
window function test based on 23 Mel channel number 8k wsj databas window function %WER ins del sub kaiser 278 / 5643=4.93 39 15 224 povey 265 / 5643=4.70 34 14 217 window function test based on 30 Mel channel number 8k wsj databas window function %WER ins del sub kaiser 270 / 5643= 4.78 38 17 215 povey 283 / 5643= 5.02 36 24 223
VAD
- DNN-based VAD (24.77) shower better performance than energy based VAD (45.73)
Scoring
- online scoring done??
- checked into gitlab?
Word to Vector
- Paper submitted
LM development
Domain specific LM
- Prepare English lexicon
NN LM
- Character-based NNLM (6700 chars, 7gram), 500M data training done.
- Inconsistent pattern in WER were found on Tenent test sets
- probably need to use another test set to do investigation.
- Investigate MS RNN LM training
QA
FST-based matching
- Word-based FST 1-2 seconds with 1600 patterns. Huilan's implementation <1 second.
- THRAX toolkit for grammar to FST
- Investigate determinization of G embedding
- Refer to Kaldi new code