“2014-11-25”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
tag LM
Lr讨论 | 贡献
Text Processing
 
(3位用户的14个中间修订版本未显示)
第4行: 第4行:
 
==== Environment ====
 
==== Environment ====
 
* Already buy 3 760GPU
 
* Already buy 3 760GPU
* grid-9 760GPU crashed again; random freeze after s ; try to investigate the reason
+
* grid-9 760GPU crashed again;
* GPU problems on grid-17?
+
* Change 760gpu card of grid-12 and grid-14
* disk (/work2) problem on grid-15
+
  
 
==== Sparse DNN ====
 
==== Sparse DNN ====
 
* Performance improvement found when pruned slightly
 
* Performance improvement found when pruned slightly
* need retraining for unpruned one; training loss  
+
* need retraining for unpruned one; training loss
* The result of AURORA 4 will be available soon.
+
 
* details at http://liuc.cslt.org/pages/sparse.html
 
* details at http://liuc.cslt.org/pages/sparse.html
  
第29行: 第27行:
  
 
* Drop out
 
* Drop out
:* dataset:wsj, testset:eval92
 
        std |  dropout0.4 | dropout0.5 | dropout0.6 | dropout0.7 | dropout0.7_iter7(maxTr-Acc) | dropout0.8 | dropout0.8_iter7(maxTr-Acc)
 
    ------------------------------------------------------------------------------------------------------------------------------------
 
        4.5 |    5.39    |    4.80    |  4.75    |  4.36      |  4.39                      |    4.55    |    4.71         
 
:** Frame-accuarcy seems not consistent with WER. Using the train-data as cv, verify the learning ability of the model.
 
    Seems in one nnet model the train top frame accuracy is not consistent with the WER.
 
:** Decode test_clean_wv1 dataset. 
 
 
 
:* AURORA4 dataset
 
:* AURORA4 dataset
  
  (1) Train: train_nosiy
+
:* Use different proportion of noise data to investigate the effect of xEnt and mpe and dropout
    drop-retention/testcase(WER) | test_clean_wv1  | test_airport_wv1 | test_babble_wv1 | test_car_wv1
+
:** Problem 1) The effect of dropout in different noise proportion;
    ---------------------------------------------------------------------------------------------------------
+
          2) The effect of MPE in different noise proportion;
           std-baseline          |  9.60          |  11.41          |  11.63          |  8.64
+
           3) The effect of MPE+dropout in different noise proportion.
    ---------------------------------------------------------------------------------------------------------
+
:**http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?step=view_request&cvssid=261
              dp-0.3             |  12.91          |  16.55          |  15.37          |  12.60
+
    ---------------------------------------------------------------------------------------------------------
+
              dp-0.4            |  11.48          |  14.43          |  13.23          |  11.04
+
    ---------------------------------------------------------------------------------------------------------
+
              dp-0.5            |  10.53          |  13.00          |  12.89          |  10.24
+
    ---------------------------------------------------------------------------------------------------------
+
              dp-0.6            |  10.02          |  12.32          |  11.81          |  9.29
+
    ---------------------------------------------------------------------------------------------------------
+
              dp-0.7            |  9.65          |  12.01          |  12.09          |  8.89
+
    ---------------------------------------------------------------------------------------------------------
+
              dp-0.8            |  9.79          |  12.01          |  11.77          |  8.91
+
    ---------------------------------------------------------------------------------------------------------
+
              dp-1.0            |  9.94          |  11.33          |  12.05          |  8.32
+
    ---------------------------------------------------------------------------------------------------------
+
      baseline_dp0.4_lr0.008    |  9.52          |  12.01          |  11.75          |  9.44
+
  ---------------------------------------------------------------------------------------------------------
+
      baseline_dp0.4_lr0.0001    |  9.92          |  14.22          |  13.59          |  10.24
+
  ---------------------------------------------------------------------------------------------------------
+
      baseline_dp0.4_lr0.00001  |  9.06          |  13.27          |  13.14          |  9.33
+
  ---------------------------------------------------------------------------------------------------------
+
      baseline_dp0.8_lr0.008    |  9.16          |  11.23          |  11.42          |  8.49
+
  ---------------------------------------------------------------------------------------------------------
+
      baseline_dp0.8_lr0.0001    |  9.22          |  11.52          |  11.77          |  8.82
+
  ---------------------------------------------------------------------------------------------------------
+
      baseline_dp0.8_lr0.00001  |  9.12          |  11.27          |  11.65          |  8.68
+
  ---------------------------------------------------------------------------------------------------------
+
        dp-0.4_follow-std-lr    |  11.33          |  14.60          |  13.50          |  10.95
+
  ---------------------------------------------------------------------------------------------------------
+
        dp-0.8_follow-std-lr    |  9.77          |  12.01          |  11.79          |  8.93
+
  ---------------------------------------------------------------------------------------------------------
+
          dp-0.4_4-2048          |  11.69          |  16.13          |  14.24          |  11.98
+
  ---------------------------------------------------------------------------------------------------------
+
          dp-0.8_4-2048          |  9.46          |  11.60          |  11.98          |  8.78
+
  ---------------------------------------------------------------------------------------------------------
+
  
:** Test with AURORA4 of 7000 (clean + noisy).
+
:** Find and test unknown noise test-data.(++)
:** Follow the standard DNN training learn-rate to avoid the different learn-rate changing time of various DNN training. Similar performance is obtained.
+
:** Find and test unknown noise test-data.(+)
+
 
:** Have done the droptout on normal trained XEnt NNET , eg wsj(learn-rate:1e-4/1e-5). Seems small learn-rate get the balance of accuracy and train-time.
 
:** Have done the droptout on normal trained XEnt NNET , eg wsj(learn-rate:1e-4/1e-5). Seems small learn-rate get the balance of accuracy and train-time.
:** Draft the dropout-DNN weight distribution. (++)
+
:** Debug the low cv frame-accuracy
 
+
* Rectification
+
:* Combine drop out and rectifier.(+)
+
:* Change the learn-rate in the middle of the training, Modify the train_nnet.sh script(Liu Chao).
+
  
 
* MaxOut
 
* MaxOut
第93行: 第43行:
 
  1) AURORA4 -15h
 
  1) AURORA4 -15h
 
     NOTE: gs==groupsize
 
     NOTE: gs==groupsize
  (1) Train: train_clean
+
:* pretraining based maxout
 +
:** Select units in Groupsize interval, but need low learn-rate
 +
:** Force accept the first iteration. Jump out from the local-minimum
 +
 
 +
* P-norm
 +
    ---------------------------------------------------------------------------------------------------------
 
         model/testcase(WER)    | test_clean_wv1  | test_airport_wv1 | test_babble_wv1 | test_car_wv1  
 
         model/testcase(WER)    | test_clean_wv1  | test_airport_wv1 | test_babble_wv1 | test_car_wv1  
 
     ---------------------------------------------------------------------------------------------------------
 
     ---------------------------------------------------------------------------------------------------------
          std-baseline         |  6.04          |  29.91          |  27.76          |  16.37
+
        nnet_std-baseline       |  6.04          |  29.91          |  27.76          |  16.37
 
     ---------------------------------------------------------------------------------------------------------
 
     ---------------------------------------------------------------------------------------------------------
          lr0.008_gs6         |                             -
+
        lr0.008-1e-7_gs6_p2    |  6.17          |  27.51          |  24.98         | 15.40
 
     ---------------------------------------------------------------------------------------------------------
 
     ---------------------------------------------------------------------------------------------------------
          lr0.008_gs10         |                             -
+
        lr0.008-1e-7_gs10_p2    |  6.40          |  28.18          |  26.60         | 15.82
 
     ---------------------------------------------------------------------------------------------------------
 
     ---------------------------------------------------------------------------------------------------------
          lr0.008_gs20         |                             -
+
        lr0.008-1e-7_gs10_p3    |  6.45          |  28.73          |  30.01         | 20.24
 
     ---------------------------------------------------------------------------------------------------------
 
     ---------------------------------------------------------------------------------------------------------
      lr0.008_l1-0.01         |                             -
+
        lr0.04-4e-3_gs6_p2      |  6.47          |  27.42          |  27.48         | 17.35
 
     ---------------------------------------------------------------------------------------------------------
 
     ---------------------------------------------------------------------------------------------------------
        lr0.008_l1-0.001        |                            -
 
    ---------------------------------------------------------------------------------------------------------
 
      lr0.008_l1-0.0001        |                            -
 
    ---------------------------------------------------------------------------------------------------------
 
    lr0.008_l1-0.000001        |                            -
 
    ---------------------------------------------------------------------------------------------------------
 
        lr0.008_l2-0.01        |                            -
 
    ---------------------------------------------------------------------------------------------------------
 
            lr0.006_gs10        |                            -
 
    ---------------------------------------------------------------------------------------------------------
 
            lr0.004_gs10        |                            -
 
    ---------------------------------------------------------------------------------------------------------
 
          lr0.002_gs10        |  6.21          |  28.48          |  27.30          |  16.37
 
    ---------------------------------------------------------------------------------------------------------
 
          lr0.001_gs1          |                            -
 
    ---------------------------------------------------------------------------------------------------------
 
          lr0.001_gs2          |                            -
 
    ---------------------------------------------------------------------------------------------------------
 
          lr0.001_gs4          |                            -
 
    ---------------------------------------------------------------------------------------------------------
 
          lr0.001_gs6          |  6.04          |  25.17          |  24.31          |  14.19
 
    ---------------------------------------------------------------------------------------------------------
 
          lr0.001_gs8          |  5.85          |  25.72          |  24.35          |  14.28
 
    ---------------------------------------------------------------------------------------------------------
 
          lr0.001_gs10        |  6.23          |  27.04          |  25.51          |  14.22
 
    ---------------------------------------------------------------------------------------------------------
 
          lr0.001_gs15        |  5.94          |  30.10          |  27.53          |  19.00
 
    ---------------------------------------------------------------------------------------------------------
 
          lr0.001_gs20        |  6.32          |  28.10          |  26.47          |  16.98
 
    ---------------------------------------------------------------------------------------------------------
 
*: pretraining based maxout
 
* P-norm
 
 
  
 
* Convolutive network (+)
 
* Convolutive network (+)
 
:* AURORA 4
 
:* AURORA 4
 +
:** 1)
 +
-----------------------------------------------------------------------------------------------------------------------
 
                   |  wer | hid-layers | hid-dim | delta-order | splice | lda-dim | learn-rate | pooling | TBA
 
                   |  wer | hid-layers | hid-dim | delta-order | splice | lda-dim | learn-rate | pooling | TBA
 
  -----------------------------------------------------------------------------------------------------------------------
 
  -----------------------------------------------------------------------------------------------------------------------
第156行: 第80行:
 
   cnn_std_1200_3  | 6.66 |    5      | 1200 |      0      |    4  |  198  |  0.008 |  3    |patch-dim1 7  
 
   cnn_std_1200_3  | 6.66 |    5      | 1200 |      0      |    4  |  198  |  0.008 |  3    |patch-dim1 7  
 
  -----------------------------------------------------------------------------------------------------------------------
 
  -----------------------------------------------------------------------------------------------------------------------
 +
:** 2)
 +
  ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 +
                                        | %WER          | Dnnhiddenlayers      | hid-dim      | pooling      | CNN_unit      |cnn_init_opts
 +
  ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 +
  cnn_nonlda_std | 5.73 | 4 | 1200    | 3 |  |"--patch-dim1 8" input_dim ~ patch-dim1
 +
  ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 +
  cnn_nonlda_cnnunit_384 | 5.85 | 4 | 1200    | 3 | 384 |"--patch-dim1 8 --num-filters2 384"  
 +
  ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 +
  cnn_nonlda_cnnunit_220 | ----------    | 4 | 1200    | 3 | 220 |"--patch-dim1 8 --num-filters2 220"  
 +
  ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  
:* READ paper
+
====MSE====
 +
(1) AURORA4 (train_clean)
 +
    drop-retention/testcase(WER) | test_clean_wv1  | test_airport_wv1 | test_babble_wv1 | test_car_wv1
 +
    ---------------------------------------------------------------------------------------------------------
 +
          std-baseline_xent    |  6.04          |  29.91          |  27.76          |  16.37
 +
    ---------------------------------------------------------------------------------------------------------
 +
          std-baseline_mse      |  6.05          |  31.30          |  30.03          |  15.77
 +
    ---------------------------------------------------------------------------------------------------------
 +
 
 +
====DAE(Deep Atuo-Encode)====
 +
  (1) train_clean
 +
    drop-retention/testcase(WER)| test_clean_wv1  | test_airport_wv1 | test_babble_wv1 | test_car_wv1
 +
  ---------------------------------------------------------------------------------------------------------
 +
      std-xEnt-sigmoid-baseline| 6.04            |    29.91        |  27.76        | 16.37
 +
  ---------------------------------------------------------------------------------------------------------
 +
      std+dae_cmvn_noFT_2-1200 | 7.10            |    15.33        |  16.58        | 9.23
 +
  ---------------------------------------------------------------------------------------------------------
 +
    std+dae_cmvn_splice5_2-100  | 8.19            |    15.21        |  15.25        | 9.31
 +
  ---------------------------------------------------------------------------------------------------------
  
 
====Denoising & Farfield ASR====
 
====Denoising & Farfield ASR====
第165行: 第117行:
 
====VAD====
 
====VAD====
 
* Frame energy feature extraction, done
 
* Frame energy feature extraction, done
* Harmonics and Teager energy features being investigation
+
* Harmonics and Teager energy features being investigation (+)
* Previous results to be organized for a paper  
+
* Previous results to be organized for a paper  
 +
* MPE model VAD test
  
 
====Speech rate training====
 
====Speech rate training====
* Data ready on tencent set; some errors on speech rate dependent model.
+
* Data ready on tencent set; some errors on speech rate dependent model
* Retrain new model
+
* Retrain new model(+)
  
 
====Scoring====
 
====Scoring====
第176行: 第129行:
 
* harmonics based timber comparison: frequency based feature is better
 
* harmonics based timber comparison: frequency based feature is better
 
* GMM based timber comparison is done. Similar to speaker recognition
 
* GMM based timber comparison is done. Similar to speaker recognition
* TODO: Code checkin and technique report.
+
* TODO: Code checkin and '''technique report'''
  
 
====Confidence====
 
====Confidence====
第185行: 第138行:
 
===Speaker ID===
 
===Speaker ID===
 
* Preparing GMM-based server.
 
* Preparing GMM-based server.
* EER ~ 11.2% (GMM-based system)
+
* EER ~ 4% (GMM-based system)--Text independent
 +
* EER ~ 6%(1s) / 0.5%(5s) (GMM-based system)--Text dependent
 
* test different number of components; fast i-vector computing
 
* test different number of components; fast i-vector computing
  
第191行: 第145行:
 
* GMM-based language is ready.
 
* GMM-based language is ready.
 
* Delivered to Jietong
 
* Delivered to Jietong
 +
* Prepare the test-case
  
===Emotion detection===
+
===Voice Conversion===
 
+
* Yiye is reading materials
* Sinovoice is implementing the server
+
  
  
第211行: 第165行:
 
====tag LM====
 
====tag LM====
  
* different weight  
+
* different weight [http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?account=lr&step=view_request&cvssid=304 2014-Nov-23,Monday]
 
:*  
 
:*  
 
{| border="2px"
 
{| border="2px"
第280行: 第234行:
 
|-
 
|-
 
|}
 
|}
:*  
+
:* conclusion:
  conclusion:
+
 
   1. compare experiment 3  with experiment 5:
 
   1. compare experiment 3  with experiment 5:
 
     same jsgf file, but the  tag number in corpus if different, we can find that when add  
 
     same jsgf file, but the  tag number in corpus if different, we can find that when add  
第288行: 第241行:
 
   same tag number in corpus, but different jsgf size, we can find that different jsgf size have the  
 
   same tag number in corpus, but different jsgf size, we can find that different jsgf size have the  
 
   same optimal weight.
 
   same optimal weight.
  * problem
 
:* check the code. the result is different on different work folder.(done)
 
:* because the am is different
 
 
* need to do
 
* need to do
:* check the relation that between weight and size of dict.
+
:* tag Probability should test add the weight(hanzhenglong) and handover to hanzhenglong ('''this week''')
:* the short term should be punished.
+
:* make a summary about tag-lm and '''journal paper'''(wxx and yuanb)('''two weeks''').
:* make a summary about tag-lm .
+
  
 
====RNN LM====
 
====RNN LM====
 
*rnn
 
*rnn
:* test RNNLM on Chinese data from jietong-data
+
:* test wer RNNLM on Chinese data from jietong-data('''this week''')
 
:* check the rnnlm code about how to Initialize and update learning rate.
 
:* check the rnnlm code about how to Initialize and update learning rate.
 +
:* generate the ngram model from rnnlm and test the ppl with different size txt.('''this week''')
 
*lstm+rnn
 
*lstm+rnn
 
:* check the lstm-rnnlm code about how to Initialize and update learning rate.
 
:* check the lstm-rnnlm code about how to Initialize and update learning rate.
第307行: 第257行:
 
====W2V based doc classification====
 
====W2V based doc classification====
  
* Initial results variable Bayesian GMM obtained. Performance is not as good as the conventional GMM.
+
* Initial results variable Bayesian GMM obtained. Performance is not as good as the conventional GMM.(hold)
 
* Non-linear inter-language transform: English-Spanish-Czch: wv model training done, transform model on investigation
 
* Non-linear inter-language transform: English-Spanish-Czch: wv model training done, transform model on investigation
:* SSA-based local linear mapping still on running.
+
====Knowledge vector====
:* k-means classes change to 2.
+
 
+
 
* Knowledge vector started
 
* Knowledge vector started
:* give a basic report
+
:* begin to code
 
+
====Character to wordr====
* Character to word conversion
+
* Character to word conversion(hold)
 
:* prepare the task: word similarity
 
:* prepare the task: word similarity
 
:* prepare the dict.
 
:* prepare the dict.
第321行: 第269行:
 
===Translation===
 
===Translation===
  
* v4.0 demo released
+
* v5.0 demo released
 
:* cut the dict and use new segment-tool
 
:* cut the dict and use new segment-tool
  
第328行: 第276行:
 
====Spell mistake====
 
====Spell mistake====
 
* retrain the ngram model('''caoli''')
 
* retrain the ngram model('''caoli''')
* prepare the test and development set('''caoli''')
 
:* need discuss it with duxk
 
 
 
====improve fuzzy match====
 
====improve fuzzy match====
 
* add Synonyms similarity using MERT-4 method(hold)
 
* add Synonyms similarity using MERT-4 method(hold)
 
 
====improve lucene search====
 
====improve lucene search====
 
:* using MERT-4 method to get good value of multi-feature.like IDF,NER,baidu_weight,keyword etc.('''liurong this month''')
 
:* using MERT-4 method to get good value of multi-feature.like IDF,NER,baidu_weight,keyword etc.('''liurong this month''')
 
 
====Multi-Scene Recognition====
 
====Multi-Scene Recognition====
* add the triples search to QA engine
+
* handover to duxk('''this week''')
* demo ('''liurong two week''')
+
====XiaoI framework====
.
+
* give a report about xiaoI framework
 
* new inter will install SEMPRE
 
* new inter will install SEMPRE
 +
====patent====
 +
* GA-method improve the QA('''this week''')

2014年12月8日 (一) 02:00的最后版本

Speech Processing

AM development

Environment

  • Already buy 3 760GPU
  • grid-9 760GPU crashed again;
  • Change 760gpu card of grid-12 and grid-14

Sparse DNN

RNN AM

  • Initial nnet seems not very well, need to be pre-trained or test lower learn-rate.
  • For AURORA 4 1h/epoch, model train done.
  • Using AURORA 4 short-sentence with a smaller number of targets.(+)
  • Adjusting the learning rate.(+)
  • Trying toolkit of Microsoft.(+)
  • details at http://liuc.cslt.org/pages/rnn.html

A new nnet training scheduler

Drop out & Rectification & convolutive network

  • Drop out
  • AURORA4 dataset
  • Use different proportion of noise data to investigate the effect of xEnt and mpe and dropout
    • Problem 1) The effect of dropout in different noise proportion;
          2) The effect of MPE in different noise proportion;
          3) The effect of MPE+dropout in different noise proportion.
    • Find and test unknown noise test-data.(++)
    • Have done the droptout on normal trained XEnt NNET , eg wsj(learn-rate:1e-4/1e-5). Seems small learn-rate get the balance of accuracy and train-time.
    • Debug the low cv frame-accuracy
  • MaxOut
  • 6min/epoch
1) AURORA4 -15h
   NOTE: gs==groupsize
  • pretraining based maxout
    • Select units in Groupsize interval, but need low learn-rate
    • Force accept the first iteration. Jump out from the local-minimum
  • P-norm
   ---------------------------------------------------------------------------------------------------------
        model/testcase(WER)    | test_clean_wv1  | test_airport_wv1 | test_babble_wv1 | test_car_wv1 
   ---------------------------------------------------------------------------------------------------------
       nnet_std-baseline       |  6.04           |  29.91           |  27.76          |  16.37
   ---------------------------------------------------------------------------------------------------------
       lr0.008-1e-7_gs6_p2     |  6.17           |  27.51           |  24.98          |  15.40 
   ---------------------------------------------------------------------------------------------------------
       lr0.008-1e-7_gs10_p2    |  6.40           |  28.18           |  26.60          |  15.82 
   ---------------------------------------------------------------------------------------------------------
       lr0.008-1e-7_gs10_p3    |  6.45           |  28.73           |  30.01          |  20.24 
   ---------------------------------------------------------------------------------------------------------
       lr0.04-4e-3_gs6_p2      |  6.47           |  27.42           |  27.48          |  17.35 
   ---------------------------------------------------------------------------------------------------------
  • Convolutive network (+)
  • AURORA 4
:** 1)
-----------------------------------------------------------------------------------------------------------------------
                 |  wer | hid-layers | hid-dim | delta-order | splice | lda-dim | learn-rate	| pooling | TBA
-----------------------------------------------------------------------------------------------------------------------
 cnn_std_baseline| 6.70 |     4      | 1200	|      0      |    4   |   198   |   0.008	|   3     |patch-dim1 7 
-----------------------------------------------------------------------------------------------------------------------
 cnn_std_1000_3  | 6.61 |     4      | 1000	|      0      |    4   |   198   |   0.008	|   3     |patch-dim1 7 
-----------------------------------------------------------------------------------------------------------------------
 cnn_std_1400_3  | 6.61 |     4      | 1400	|      0      |    4   |   198   |   0.008	|   3     |patch-dim1 7 
-----------------------------------------------------------------------------------------------------------------------
 cnn_std_1200_4  | 6.91 |     4      | 1200	|      0      |    4   |   198   |   0.008	|   4     |patch-dim1 6 
-----------------------------------------------------------------------------------------------------------------------
 cnn_std_1200_2  | -    |     4      | 1200	|      0      |    4   |   198   |   0.008	|   2     |patch-dim1 8 
-----------------------------------------------------------------------------------------------------------------------
 cnn_std_1200_3  | 6.66 |     5      | 1200	|      0      |    4   |   198   |   0.008	|   3     |patch-dim1 7 
-----------------------------------------------------------------------------------------------------------------------
:** 2)
 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                       | %WER          | Dnnhiddenlayers       | hid-dim       | pooling       | CNN_unit      |cnn_init_opts
 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 cnn_nonlda_std			| 5.73		| 4			| 1200    	| 3 		|  		|"--patch-dim1 8" input_dim ~ patch-dim1
 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 cnn_nonlda_cnnunit_384		| 5.85		| 4			| 1200    	| 3		| 384		|"--patch-dim1 8 --num-filters2 384"	   
 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 cnn_nonlda_cnnunit_220		| ----------    | 4			| 1200    	| 3		| 220		|"--patch-dim1 8 --num-filters2 220"	   
 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------

MSE

(1) AURORA4 (train_clean)
   drop-retention/testcase(WER) | test_clean_wv1  | test_airport_wv1 | test_babble_wv1 | test_car_wv1 
   ---------------------------------------------------------------------------------------------------------
          std-baseline_xent     |  6.04           |  29.91           |  27.76          |  16.37
   ---------------------------------------------------------------------------------------------------------
          std-baseline_mse      |  6.05           |  31.30           |  30.03          |  15.77
   ---------------------------------------------------------------------------------------------------------

DAE(Deep Atuo-Encode)

 (1) train_clean
   drop-retention/testcase(WER)| test_clean_wv1  | test_airport_wv1 | test_babble_wv1 | test_car_wv1 
  ---------------------------------------------------------------------------------------------------------
      std-xEnt-sigmoid-baseline| 6.04            |    29.91         |   27.76         | 16.37
  ---------------------------------------------------------------------------------------------------------
      std+dae_cmvn_noFT_2-1200 | 7.10            |    15.33         |   16.58         | 9.23
  ---------------------------------------------------------------------------------------------------------
   std+dae_cmvn_splice5_2-100  | 8.19            |    15.21         |   15.25         | 9.31
  ---------------------------------------------------------------------------------------------------------

Denoising & Farfield ASR

  • ICASSP paper submitted.
  • HOLD

VAD

  • Frame energy feature extraction, done
  • Harmonics and Teager energy features being investigation (+)
  • Previous results to be organized for a paper
  • MPE model VAD test

Speech rate training

  • Data ready on tencent set; some errors on speech rate dependent model
  • Retrain new model(+)

Scoring

  • Timber Comparison done.
  • harmonics based timber comparison: frequency based feature is better
  • GMM based timber comparison is done. Similar to speaker recognition
  • TODO: Code checkin and technique report

Confidence

  • Reproduce the experiments on fisher dataset.
  • Use the fisher DNN model to decode all-wsj dataset
  • preparing scoring for puqiang data

Speaker ID

  • Preparing GMM-based server.
  • EER ~ 4% (GMM-based system)--Text independent
  • EER ~ 6%(1s) / 0.5%(5s) (GMM-based system)--Text dependent
  • test different number of components; fast i-vector computing

Language ID

  • GMM-based language is ready.
  • Delivered to Jietong
  • Prepare the test-case

Voice Conversion

  • Yiye is reading materials


Text Processing

LM development

Domain specific LM

  • domain lm(need to discuss with xiaoxi)
  • embedded language model(this week)
  • train some more LMs with Zhenlong (dianzishu sogou bbs chosen)("need result").
  • keep on training sogou2T lm(14/16 on 3rd iteration).(this week)
  • new dict.
  • handover of this work to hanzhenglong, give a simple docuemnt(this week)

tag LM

different weight
method tag-jsgf corpus weight wer ser add_wer
experiment 3 500(490 less frequent and 10 unseen) 500 0.1 16.72 77.92 -
0.3 15.42 71.25 -
0.5 15.40 69.58 -
0.7 15.28 68.75 -
0.8 15.38 68.33 -
1 15.98 69.17 -
2 19.08 70.83 -
experiment 4 100(90 less frequent and 10 unseen) 100 0.008 15.28 69.58 -
0.02 14.84 69.58 -
0.05 15.11 69.58 -
0.1 15.30 69.75 -
0.3 16.01 70.42 -
experiment 5 500 100 0.01 17.57 78.75 -
0.05 16.84 77.08 -
0.08 16.59 76.25 -
0.15 16.76 75.42 -
experiment 6 1280 500 0.1 17.42 77.92 -
0.5 15.20 69.17 -
0.8 15.30 68.33 -
1 15.69 69.58 -
  • conclusion:
 1. compare experiment 3  with experiment 5:
   same jsgf file, but the  tag number in corpus if different, we can find that when add 
 more tag to corpus, the optimal weight is larger.
 2. compare experiment 3 with experiment 6:
  same tag number in corpus, but different jsgf size, we can find that different jsgf size have the 
 same optimal weight.
  • need to do
  • tag Probability should test add the weight(hanzhenglong) and handover to hanzhenglong (this week)
  • make a summary about tag-lm and journal paper(wxx and yuanb)(two weeks).

RNN LM

  • rnn
  • test wer RNNLM on Chinese data from jietong-data(this week)
  • check the rnnlm code about how to Initialize and update learning rate.
  • generate the ngram model from rnnlm and test the ppl with different size txt.(this week)
  • lstm+rnn
  • check the lstm-rnnlm code about how to Initialize and update learning rate.

Word2Vector

W2V based doc classification

  • Initial results variable Bayesian GMM obtained. Performance is not as good as the conventional GMM.(hold)
  • Non-linear inter-language transform: English-Spanish-Czch: wv model training done, transform model on investigation

Knowledge vector

  • Knowledge vector started
  • begin to code

Character to wordr

  • Character to word conversion(hold)
  • prepare the task: word similarity
  • prepare the dict.

Translation

  • v5.0 demo released
  • cut the dict and use new segment-tool

QA

deatil:[1]

Spell mistake

  • retrain the ngram model(caoli)

improve fuzzy match

  • add Synonyms similarity using MERT-4 method(hold)

improve lucene search

  • using MERT-4 method to get good value of multi-feature.like IDF,NER,baidu_weight,keyword etc.(liurong this month)

Multi-Scene Recognition

  • handover to duxk(this week)

XiaoI framework

  • give a report about xiaoI framework
  • new inter will install SEMPRE

patent

  • GA-method improve the QA(this week)