“2014-04-25”版本间的差异

2014年4月25日 (五) 02:12的最后版本

Resoruce Building

Maxi onboard
Release management should be started: Zhiyong (+)
Blaster 0.1 & vivian 0.0 system release

Leftover questions

Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test. Overfitting?
Multi GPU training: Error encountered
Multilanguage training
Investigating LOUDS FST.
CLG embedded decoder plus online compiler.
DNN-GMM co-training

AM development

Sparse DNN

GA-based block sparsity (+)

Found a paper in 2000 with similar ideas.
Try to get a student working on high performance computing to do the optimization

Noise training

More experiments with no-noise (+)
More experiments with additional noise types (+)

AMR compression re-training

Stream model deliver to wechat server (Mengyuan + Liuchao)

GFbank

GFBank Sinovoice test on 1700 MPE (10.34-10.14)
GFBank sinovoice 1700 MPE stream

Multilingual ASR

all phone strategy baseline done
Some strange behavior observed when fixing early-leyers click here

Denoising & Farfield ASR

Baseline: close-talk model decode far-field speech: 92.65
MPE1: 92.78

  MPE2:  91.15
  MPE3:  91.21
  MPE4:  91.51

Will test the result on the dev set

VAD

VAD bug on smoothing approach was found.

Scoring

A speaker identification system based on ivector was delivered
Male/female identification based on UBM was delievered
Phone-sequence based graph decoding was delivered

Word to Vector

Dimension of low space varies from 10-100 done. Expand to 200 dimensions. Some strange behavior was found on w2v. Try on daily people data.
Test multi-classification from 2-9. w2v done. Work on lda.
Test on various w2v window-size n=3-15. Strange behavior at n=9. click here

LM development

NN LM

Character-based NNLM (6700 chars, 7gram), 500M data training done.

Overlfow found. Code change done. Run into 6 iterations.

Investigate MS RNN LM training

QA

FST-based matching

Word-based FST 1-2 seconds with 1600 patterns. Huilan's implementation <1 second.
THRAX toolkit for grammar to FST

Investigate determinization of G embedding

Refer to Kaldi new code

@@ 第42行： / 第42行： @@
 *  Baseline:  close-talk model decode far-field speech: 92.65
-*  MPE iteration:  92.78
+*  MPE1:  92.78
-.15
+   MPE2:  91.15
-.21
+   MPE3:  91.21
-.51
+   MPE4:  91.51
 * Will test the result on the dev set
 ===VAD===

“2014-04-25”版本间的差异

2014年4月25日 (五) 02:12的最后版本

目录

Resoruce Building

Leftover questions

AM development

Sparse DNN

Noise training

AMR compression re-training

GFbank

Multilingual ASR

Denoising & Farfield ASR

VAD

Scoring

Word to Vector

LM development

NN LM

QA

FST-based matching

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具