“2014-05-16”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
(以内容“==Resoruce Building== * Maxi onboard * Release management should be started: Zhiyong (+) * Blaster 0.1 & vivian 0.0 system release == Leftover questions== * Asymmetric...”创建新页面)
 
 
第20行: 第20行:
  
 
===Noise training===
 
===Noise training===
:* With-clean training done. Much better on clean testing
+
:* More with-clean training completed. 2 conditions left
:* Experiments done. Prepare paper.  
+
  
 
===GFbank===
 
===GFbank===
  
* GFBank sinovoice 1400 MPE stream
+
* 8k train
* GFBank sinovoice 6000 MPE stream
+
:* GFBank sinovoice 1400 MPE stream
 +
 
 +
* 16k train
 +
:* GFBank sinovoice 6000 MPE1 stream: worse than 1700h  (10.18-11.11)
 +
 
  
 
===Multilingual ASR===
 
===Multilingual ASR===
  
* MPE-based training is not very sensitive to data imbalance for English & Chinese
+
* Test sharing scheme:
* Data duplication can trade-off the performance of two languages
+
:* decision tree share, xent improvement obtained, MPE no improvement (Chinese worse a bit, English a bit better).
* Test sharing shemes
+
 
 +
 
 +
===English model===
 +
 
 +
<pre>
 +
                            mic          tel
 +
pure eng                    voxforge    fisher
 +
chinese eng                shujutang  convert-from-shujutang
 +
</pre>
  
 
===Denoising & Farfield ASR===
 
===Denoising & Farfield ASR===
第38行: 第49行:
 
*  Baseline:  close-talk model decode far-field speech: 92.65
 
*  Baseline:  close-talk model decode far-field speech: 92.65
 
*  Will investigate DAE model.
 
*  Will investigate DAE model.
 +
 +
===Kaiser Window ===
 +
<pre>
 +
window function test based on 23 Mel channel number  8k wsj databas
 +
window function %WER       ins del    sub
 +
kaiser 278 / 5643=4.93 39 15 224
 +
povey 265 / 5643=4.70 34 14 217
 +
 +
window function test based on 30 Mel channel number  8k wsj databas
 +
window function %WER         ins del sub
 +
kaiser 270 / 5643= 4.78 38 17 215
 +
povey 283 / 5643= 5.02 36 24 223
 +
</pre>
  
 
===VAD===
 
===VAD===
  
* VAD bug fixed???
+
* DNN-based VAD (24.77) shower better performance than energy based VAD (45.73)
* Test frame VAD accuracy
+
 
  
 
===Scoring===
 
===Scoring===
第49行: 第73行:
  
 
==Word to Vector==
 
==Word to Vector==
* Paper writing done
+
* Paper submitted
  
 
==LM development==
 
==LM development==
 +
 +
===Domain specific LM===
 +
* Prepare English lexicon
  
 
===NN LM===
 
===NN LM===

2014年5月16日 (五) 01:42的最后版本

Resoruce Building

  • Maxi onboard
  • Release management should be started: Zhiyong (+)
  • Blaster 0.1 & vivian 0.0 system release

Leftover questions

  • Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test. Overfitting?
  • Multi GPU training: Error encountered
  • Multilanguage training
  • Investigating LOUDS FST.
  • CLG embedded decoder plus online compiler.
  • DNN-GMM co-training

AM development

Sparse DNN

  • GA-based block sparsity (+++)
  • Found a paper in 2000 with similar ideas.
  • Try to get a student working on high performance computing to do the optimization

Noise training

  • More with-clean training completed. 2 conditions left

GFbank

  • 8k train
  • GFBank sinovoice 1400 MPE stream
  • 16k train
  • GFBank sinovoice 6000 MPE1 stream: worse than 1700h (10.18-11.11)


Multilingual ASR

  • Test sharing scheme:
  • decision tree share, xent improvement obtained, MPE no improvement (Chinese worse a bit, English a bit better).


English model

                             mic          tel
pure eng                    voxforge    fisher
chinese eng                 shujutang   convert-from-shujutang 

Denoising & Farfield ASR

  • Baseline: close-talk model decode far-field speech: 92.65
  • Will investigate DAE model.

Kaiser Window

window function test based on 23 Mel channel number  8k wsj databas	
window function	%WER	       ins	del     sub
kaiser	 278 / 5643=4.93	39	15	224
povey	 265 / 5643=4.70	34	14	217

window function test based on 30 Mel channel number  8k wsj databas
window function	%WER	        ins	del	sub
kaiser	 270 / 5643= 4.78	38	17	215
povey	283 / 5643= 5.02 	36	24	223

VAD

  • DNN-based VAD (24.77) shower better performance than energy based VAD (45.73)


Scoring

  • online scoring done??
  • checked into gitlab?

Word to Vector

  • Paper submitted

LM development

Domain specific LM

  • Prepare English lexicon

NN LM

  • Character-based NNLM (6700 chars, 7gram), 500M data training done.
  • Inconsistent pattern in WER were found on Tenent test sets
  • probably need to use another test set to do investigation.
  • Investigate MS RNN LM training

QA

FST-based matching

  • Word-based FST 1-2 seconds with 1600 patterns. Huilan's implementation <1 second.
  • THRAX toolkit for grammar to FST
  • Investigate determinization of G embedding
  • Refer to Kaldi new code