2014-04-11

Resoruce Building

Current text resource has been re-arranged and listed

Leftover questions

Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test. Overfitting?
Multi GPU training: Error encountered
Multilanguage training
Investigating LOUDS FST.
CLG embedded decoder plus online compiler.
DNN-GMM co-training

AM development

Sparse DNN

GA-based block sparsity

Found a paper in 2000 with similar ideas.
Try to get a student working on high performance computing to do the optimization

Noise training

More experiments with no-noise
More experiments with additional noise types

AMR compression re-training

1700h MPE adaptation done
1700h stream mode adaptation runs into MPE1

GFbank

Significant improvement found with GFBank
Significant improvement found with FBank + GFBank

Denoising & Farfield ASR

Recording done
Prepare to construct the baseline

VAD

Code ready, need to figure out speech/no-speech smooth

Farfield recognition

Scoring

g-score based on MLP is done
t-score based on linear regression improves the performance

Word to Vector

LDA baseline (sogou 1700*9 training set) done
Wordvector classification is much better than the LDA system

word vector: 
           general: dict - 15w;   train_data - ren_ming_ri_bao(5g);  windows-5
           1. size - 50  time=30m 12thread
           2. size - 10  time=10m 12thread

data: class_num=9  document_num=9*2000
      train_num =9*1600
      test_num  =9*200
       dev_num  =9*200

train_set:
                   C000008  C000010 C000013 C000014 C000016 C000020 C000022 C000023 C000024     total 
                       财经    IT      健康     体育      旅游   教育      招聘     文化    军事   
       lda_inf      0.845   0.2756  0.698   0.9502   0.63499  0.32   0.8080   0.3505 0.864    0.6385
      lda_inf_10    0.8149  0.0887  0.628   0.9641   0.5739   0.105  0.707363 0.2334 0.8628   0.553167
 w2v_filter_filer   0.7463  0.713   0.657   0.9106   0.68659  0.54   0.74638  0.692  0.84518  0.72638
w2v_filter_filer_10 0.7608  0.4323  0.57394 0.865    0.549    0.335  0.577    0.6129 0.78099  0.609769

test_set:
                     C000008  C000010 C000013 C000014 C000016 C000020 C000022 C000023 C000024   total 
                       财经    IT      健康     体育      旅游   教育      招聘     文化    军事  
w2v_filter_filter    0.6865   0.7263   0.6716  0.84577 0.7462  0.46268 0.6567  0.7114  0.8905    0.71088
w2v_filter_filter_10 0.791    0.4079   0.56218 0.74129 0.62189 0.22885 0.562   0.6766  0.84079   0.603648
    lda_inf          0.8706   0.26368  0.6965  0.8009  0.582   0.2537  0.72139 0.3184  0.82587   0.59259
   lda_inf_10        0.776    0.1044   0.6467  0.9054  0.62189 0.1144  0.56218 0.24378 0.796     0.530127

note:w2v_filter--remove the stop word in traing word vector
note:w2v_filter_filter  -- remove the stop word in traing word vector and remove the documnet stop words

LM development

NN LM

Character-based NNLM (6700 chars, 7gram), 500M data training done.

Non-boundary char LM is better than boundary char LM

Investigate MS RNN LM training

QA

FST-based matching

Word-based FST 1-2 seconds with 1600 patterns. Huilan's implementation <1 second. ?????
Char-FST Implementation is done.

Speech QA

Investigate determinization of G embedding

2014-04-11

目录

Resoruce Building

Leftover questions

AM development

Sparse DNN

Noise training

AMR compression re-training

GFbank

Denoising & Farfield ASR

VAD

Farfield recognition

Scoring

Word to Vector

LM development

NN LM

QA

FST-based matching

Speech QA

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具