2014-06-27

Resoruce Building

Leftover questions

Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test.
Multi GPU training: Error encountered
Multilanguage training
Investigating LOUDS FST.
CLG embedded decoder plus online compiler.
DNN-GMM co-training

AM development

Sparse DNN

GA-based block sparsity (++++++++)

Noise training

Paper writing on going

GFbank

Running into Sinovoice 8k 1400 + 100 mixture training.
FBank/GFbank, stream/non-stream MPE completed:

                                   Huawei disanpi     BJ mobile   8k English data       
FBank non-stream (MPE4)             20.44%              22.28%      24.36%
FBank stream (MPE4)                 19.46%              22.00%      21.19%
GFbank stream    (MPE4)             20.69%              22.84%      24.45%
GFbank non-stream (MPE)             -                     -           -

Multilingual ASR

                                   HW 27h (HW TR LM not involved)     HW27h (HW TR LM involved)
Fbank non-stream (monolang)         21.64                                   20.72
FBank non-stream (MPE4)             22.23                                   21.38
FBank stream (MPE4)                 21.99                                     -

Denoising & Farfield ASR

correlation-based alignment is done. this is necessary since more the recording device may cause artificial delay.
how about the output cmvn test?
deliver the recording to /nfs/disk/perm/data/corpora/reverberant

Original model:

xEnt model:
               middle-field    far-field
    dev93       74.79          96.68
    eval92      63.42          94.75

MPE model:


MPE adaptation: 

               middle-field    far-field
    dev93       63.71          94.84
    eval92      52.67          90.45

VAD

DNN-based VAD (7.49) showers much better performance than energy based VAD (45.74)
100 X n (n<=3) hidden units with 2 output units seem sufficient for VAD
report form

Scoring

refine the model with AMIDA database. Local minimum observed.
ivector-based speaker detection seems find, reach 96% with 100 speakers

Embedded decoder


AM: 600x4+800 xent9 model: 



pruning threshold: 1e-5, Nobiglm
------------------------------------------------------------------------------------------
             |    150k   |   80k    |     40k     |     20k    |    10k     |      5k    |
------------------------------------------------------------------------------------------
      wer    |    26.60  |   27.16  |    28.11    |    29.14   |   31.02    |    33.37   |
------------------------------------------------------------------------------------------
       RT    |    0.68   |   0.66   |    0.61     |    0.61    |    0.58    |    0.56    |
------------------------------------------------------------------------------------------
 graph size  |     21M   |    14M   |    9.1M     |    6.9M    |    5.5M    |    4.1M    |
------------------------------------------------------------------------------------------

YINSHI:2014-Jun-24,Wednesday,10:7:0 


pruning threshold: 1e-6, Nobiglm
------------------------------------------------------------------------------------------
             |    150k   |   80k    |     40k     |     20k    |    10k     |      5k    |
------------------------------------------------------------------------------------------
      wer    |    22.49  |   23.05  |    24.15    |    25.51   |   27.71    |    30.71   |
------------------------------------------------------------------------------------------
       RT    |    0.89   |   0.84   |    0.76     |    0.70    |    0.68    |    0.64    |
------------------------------------------------------------------------------------------
 graph size  |     98M   |    86M   |     67M     |    49M     |    34M     |     24M    |
------------------------------------------------------------------------------------------

YINSHI:2014-Jun-27,Saturday,0:52:35 


pruning threshold: 1e-6.5, biglm
------------------------------------------------------------------------------------------
             |    150k   |   80k    |     40k     |     20k    |    10k     |      5k    |
------------------------------------------------------------------------------------------
      wer    |    21.12  |   21.75  |    22.92    |    24.39   |   26.89    |    30.01   |
------------------------------------------------------------------------------------------
       RT    |    1.45   |   1.25   |    1.16     |    1.11    |    1.02    |    0.94    |
------------------------------------------------------------------------------------------
 graph size  |     38M   |    35M   |     30M     |    25M     |    20M     |     15M    |
------------------------------------------------------------------------------------------

YINSHI:2014-Jun-27,Saturday,0:58:27 


pruning threshold: 1e-5.5, Nobiglm
------------------------------------------------------------------------------------------
             |    150k   |   80k    |     40k     |     20k    |    10k     |      5k    |
------------------------------------------------------------------------------------------
      wer    |    24.46  |   25.05  |    26.05    |    27.11   |   29.36    |    32.01   |
------------------------------------------------------------------------------------------
       RT    |    0.71   |   0.69   |    0.66     |    0.63    |    0.60    |    0.58    |
------------------------------------------------------------------------------------------
 graph size  |     39M   |    32M   |     25M     |    19M     |    14M     |    9.2M    |
------------------------------------------------------------------------------------------

LM development

Domain specific LM

Baiduzhidao + Weibeo extraction done with various thresholds
Looks like the extracted text can improve to some extent, but the major change seems come from pre-processing.
Check proportion of tags int HW 30h data

Word2Vector

W2V based doc classification

Full Gaussian based doc vector

represent each doc with a Gaussian distribution of the word vectors it involved.
using k-nn to conduct classification

             mean Eur Distance     KL distance  diagonal KL  baseline (NB with mean)

Acc (50dim)    81.84                79.65          -              69.7

svm-based classification

                       mean Eur Distance     KL distance    diagonal KL         LDA

2-class Acc (50dim)       95.57                 -               -              95.80
8-class Acc (50dim)       88.79                 -               -                -

Semantic word tree

Version v2.0 released (filter with query log)
Please deliver to /nfs/disk/perm/data/corpora/semanticTree (Xingchao)
Version v3.0 under going. Further refinement with Baidu Baike hierarchy

NN LM

Character-based NNLM (6700 chars, 7gram), 500M data training done.

Inconsistent pattern in WER were found on Tenent test sets
probably need to use another test set to do investigation.

Investigate MS RNN LM training

2014-06-27

目录

Resoruce Building

Leftover questions

AM development

Sparse DNN

Noise training

GFbank

Multilingual ASR

Denoising & Farfield ASR

VAD

Scoring

Embedded decoder

LM development

Domain specific LM

Word2Vector

W2V based doc classification

Semantic word tree

NN LM

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具