2014-11-25

来自cslt Wiki
2014年11月24日 (一) 05:17Lr讨论 | 贡献的版本

跳转至: 导航搜索

Speech Processing

AM development

Environment

  • Already buy 3 760GPU
  • grid-9 760GPU crashed again; random freeze after s ; try to investigate the reason
  • GPU problems on grid-17?
  • disk (/work2) problem on grid-15

Sparse DNN

  • Performance improvement found when pruned slightly
  • need retraining for unpruned one; training loss
  • The result of AURORA 4 will be available soon.
  • details at http://liuc.cslt.org/pages/sparse.html

RNN AM

  • Initial nnet seems not very well, need to be pre-trained or test lower learn-rate.
  • For AURORA 4 1h/epoch, model train done.
  • Using AURORA 4 short-sentence with a smaller number of targets.(+)
  • Adjusting the learning rate.(+)
  • Trying toolkit of Microsoft.(+)
  • details at http://liuc.cslt.org/pages/rnn.html

A new nnet training scheduler

Drop out & Rectification & convolutive network

  • Drop out
  • dataset:wsj, testset:eval92
       std |  dropout0.4 | dropout0.5 | dropout0.6 | dropout0.7 | dropout0.7_iter7(maxTr-Acc) | dropout0.8 | dropout0.8_iter7(maxTr-Acc)
    ------------------------------------------------------------------------------------------------------------------------------------ 
       4.5 |     5.39    |    4.80    |   4.75     |  4.36      |  4.39                       |    4.55    |    4.71           
    • Frame-accuarcy seems not consistent with WER. Using the train-data as cv, verify the learning ability of the model.
   Seems in one nnet model the train top frame accuracy is not consistent with the WER. 
    • Decode test_clean_wv1 dataset.
  • AURORA4 dataset
  (1) Train: train_nosiy
   drop-retention/testcase(WER) | test_clean_wv1  | test_airport_wv1 | test_babble_wv1 | test_car_wv1 
   ---------------------------------------------------------------------------------------------------------
          std-baseline          |  9.60           |  11.41           |  11.63          |  8.64
   ---------------------------------------------------------------------------------------------------------
             dp-0.3             |  12.91          |  16.55           |  15.37          |  12.60
   ---------------------------------------------------------------------------------------------------------
             dp-0.4             |  11.48          |  14.43           |  13.23          |  11.04
   ---------------------------------------------------------------------------------------------------------
             dp-0.5             |  10.53          |  13.00           |  12.89          |  10.24
   ---------------------------------------------------------------------------------------------------------
             dp-0.6             |  10.02          |  12.32           |  11.81          |  9.29
   ---------------------------------------------------------------------------------------------------------
             dp-0.7             |  9.65           |  12.01           |  12.09          |  8.89
   ---------------------------------------------------------------------------------------------------------
             dp-0.8             |  9.79           |  12.01           |  11.77          |  8.91
   ---------------------------------------------------------------------------------------------------------
             dp-1.0             |  9.94           |  11.33           |  12.05          |  8.32
   ---------------------------------------------------------------------------------------------------------
     baseline_dp0.4_lr0.008     |  9.52           |  12.01           |  11.75          |  9.44
  ---------------------------------------------------------------------------------------------------------
     baseline_dp0.4_lr0.0001    |  9.92           |  14.22           |  13.59          |  10.24
  ---------------------------------------------------------------------------------------------------------
     baseline_dp0.4_lr0.00001   |  9.06           |  13.27           |  13.14          |  9.33
  ---------------------------------------------------------------------------------------------------------
     baseline_dp0.8_lr0.008     |  9.16           |  11.23           |  11.42          |  8.49
  ---------------------------------------------------------------------------------------------------------
     baseline_dp0.8_lr0.0001    |  9.22           |  11.52           |  11.77          |  8.82
  ---------------------------------------------------------------------------------------------------------
     baseline_dp0.8_lr0.00001   |  9.12           |  11.27           |  11.65          |  8.68
  ---------------------------------------------------------------------------------------------------------
       dp-0.4_follow-std-lr     |  11.33          |  14.60           |  13.50          |  10.95
  ---------------------------------------------------------------------------------------------------------
       dp-0.8_follow-std-lr     |  9.77           |  12.01           |  11.79          |  8.93
  ---------------------------------------------------------------------------------------------------------
         dp-0.4_4-2048          |  11.69          |  16.13           |  14.24          |  11.98
  ---------------------------------------------------------------------------------------------------------
         dp-0.8_4-2048          |  9.46           |  11.60           |  11.98          |  8.78
  ---------------------------------------------------------------------------------------------------------
    • Test with AURORA4 of 7000 (clean + noisy).
    • Follow the standard DNN training learn-rate to avoid the different learn-rate changing time of various DNN training. Similar performance is obtained.
    • Find and test unknown noise test-data.(+)
    • Have done the droptout on normal trained XEnt NNET , eg wsj(learn-rate:1e-4/1e-5). Seems small learn-rate get the balance of accuracy and train-time.
    • Draft the dropout-DNN weight distribution. (++)
  • Rectification
  • Combine drop out and rectifier.(+)
  • Change the learn-rate in the middle of the training, Modify the train_nnet.sh script(Liu Chao).
  • MaxOut
  • 6min/epoch
1) AURORA4 -15h
   NOTE: gs==groupsize
 (1) Train: train_clean
        model/testcase(WER)    | test_clean_wv1  | test_airport_wv1 | test_babble_wv1 | test_car_wv1 
   ---------------------------------------------------------------------------------------------------------
          std-baseline         |  6.04           |  29.91           |  27.76          |  16.37
   ---------------------------------------------------------------------------------------------------------
          lr0.008_gs6          |                             - 
   ---------------------------------------------------------------------------------------------------------
         lr0.008_gs10          |                             - 
   ---------------------------------------------------------------------------------------------------------
         lr0.008_gs20          |                             - 
   ---------------------------------------------------------------------------------------------------------
      lr0.008_l1-0.01          |                             - 
   ---------------------------------------------------------------------------------------------------------
       lr0.008_l1-0.001        |                             - 
   ---------------------------------------------------------------------------------------------------------
      lr0.008_l1-0.0001        |                             - 
   ---------------------------------------------------------------------------------------------------------
    lr0.008_l1-0.000001        |                             - 
   ---------------------------------------------------------------------------------------------------------
        lr0.008_l2-0.01        |                             - 
   ---------------------------------------------------------------------------------------------------------
           lr0.006_gs10        |                             - 
   ---------------------------------------------------------------------------------------------------------
           lr0.004_gs10        |                             - 
   ---------------------------------------------------------------------------------------------------------
          lr0.002_gs10         |  6.21           |  28.48           |  27.30          |  16.37
   ---------------------------------------------------------------------------------------------------------
          lr0.001_gs1          |                             -
   ---------------------------------------------------------------------------------------------------------
          lr0.001_gs2          |                             -
   ---------------------------------------------------------------------------------------------------------
          lr0.001_gs4          |                             -
   ---------------------------------------------------------------------------------------------------------
          lr0.001_gs6          |  6.04           |  25.17           |  24.31          |  14.19
   ---------------------------------------------------------------------------------------------------------
          lr0.001_gs8          |  5.85           |  25.72           |  24.35          |  14.28
   ---------------------------------------------------------------------------------------------------------
          lr0.001_gs10         |  6.23           |  27.04           |  25.51          |  14.22
   ---------------------------------------------------------------------------------------------------------
          lr0.001_gs15         |  5.94           |  30.10           |  27.53          |  19.00
   ---------------------------------------------------------------------------------------------------------
          lr0.001_gs20         |  6.32           |  28.10           |  26.47          |  16.98
   ---------------------------------------------------------------------------------------------------------
  • pretraining based maxout
  • P-norm


  • Convolutive network (+)
  • AURORA 4
                 |  wer | hid-layers | hid-dim | delta-order | splice | lda-dim | learn-rate	| pooling | TBA
-----------------------------------------------------------------------------------------------------------------------
 cnn_std_baseline| 6.70 |     4      | 1200	|      0      |    4   |   198   |   0.008	|   3     |patch-dim1 7 
-----------------------------------------------------------------------------------------------------------------------
 cnn_std_1000_3  | 6.61 |     4      | 1000	|      0      |    4   |   198   |   0.008	|   3     |patch-dim1 7 
-----------------------------------------------------------------------------------------------------------------------
 cnn_std_1400_3  | 6.61 |     4      | 1400	|      0      |    4   |   198   |   0.008	|   3     |patch-dim1 7 
-----------------------------------------------------------------------------------------------------------------------
 cnn_std_1200_4  | 6.91 |     4      | 1200	|      0      |    4   |   198   |   0.008	|   4     |patch-dim1 6 
-----------------------------------------------------------------------------------------------------------------------
 cnn_std_1200_2  | -    |     4      | 1200	|      0      |    4   |   198   |   0.008	|   2     |patch-dim1 8 
-----------------------------------------------------------------------------------------------------------------------
 cnn_std_1200_3  | 6.66 |     5      | 1200	|      0      |    4   |   198   |   0.008	|   3     |patch-dim1 7 
-----------------------------------------------------------------------------------------------------------------------
  • READ paper

Denoising & Farfield ASR

  • ICASSP paper submitted.
  • HOLD

VAD

  • Frame energy feature extraction, done
  • Harmonics and Teager energy features being investigation
  • Previous results to be organized for a paper

Speech rate training

  • Data ready on tencent set; some errors on speech rate dependent model.
  • Retrain new model

Scoring

  • Timber Comparison done.
  • harmonics based timber comparison: frequency based feature is better
  • GMM based timber comparison is done. Similar to speaker recognition
  • TODO: Code checkin and technique report.

Confidence

  • Reproduce the experiments on fisher dataset.
  • Use the fisher DNN model to decode all-wsj dataset
  • preparing scoring for puqiang data

Speaker ID

  • Preparing GMM-based server.
  • EER ~ 11.2% (GMM-based system)
  • test different number of components; fast i-vector computing

Language ID

  • GMM-based language is ready.
  • Delivered to Jietong

Emotion detection

  • Sinovoice is implementing the server


Text Processing

LM development

Domain specific LM

  • domain lm(need to discuss with xiaoxi)
  • embedded language model(this week)
  • train some more LMs with Zhenlong (dianzishu sogou bbs chosen)("need result").
  • keep on training sogou2T lm(14/16 on 3rd iteration).(this week)
  • new dict.
  • handover of this work to hanzhenglong, give a simple docuemnt(this week)

tag LM

different weight
method tag-jsgf corpus weight wer ser add_wer
experiment 3 500(490 less frequent and 10 unseen) 500 0.1 16.72 77.92 -
0.3 15.42 71.25 -
0.5 15.40 69.58 -
0.7 15.28 68.75 -
0.8 15.38 68.33 -
1 15.98 69.17 -
2 19.08 70.83 -
experiment 4 100(90 less frequent and 10 unseen) 100 0.008 15.28 69.58 -
0.02 14.84 69.58 -
0.05 15.11 69.58 -
0.1 15.30 69.75 -
0.3 16.01 70.42 -
experiment 5 500 100 0.01 17.57 78.75 -
0.05 16.84 77.08 -
0.08 16.59 76.25 -
0.15 16.76 75.42 -
experiment 6 1280 500 0.1 17.42 77.92 -
0.5 15.20 69.17 -
0.8 15.30 68.33 -
1 15.69 69.58 -
  • conclusion:
 1. compare experiment 3  with experiment 5:
   same jsgf file, but the  tag number in corpus if different, we can find that when add 
 more tag to corpus, the optimal weight is larger.
 2. compare experiment 3 with experiment 6:
  same tag number in corpus, but different jsgf size, we can find that different jsgf size have the 
 same optimal weight.
  • need to do
  • tag Probability should test add the weight(hanzhenglong) and handover to hanzhenglong (this week)
  • make a summary about tag-lm and journal paper(wxx and yuanb)(two weeks).

RNN LM

  • rnn
  • test wer RNNLM on Chinese data from jietong-data(this week)
  • check the rnnlm code about how to Initialize and update learning rate.
  • generate the ngram model from rnnlm and test the ppl with different size txt.(this week)
  • lstm+rnn
  • check the lstm-rnnlm code about how to Initialize and update learning rate.

Word2Vector

W2V based doc classification

  • Initial results variable Bayesian GMM obtained. Performance is not as good as the conventional GMM.(hold)
  • Non-linear inter-language transform: English-Spanish-Czch: wv model training done, transform model on investigation

Knowledge vector

  • Knowledge vector started
  • begin to code

Character to wordr

  • Character to word conversion(hold)
  • prepare the task: word similarity
  • prepare the dict.

Translation

  • v5.0 demo released
  • cut the dict and use new segment-tool

QA

deatil:[1]

Spell mistake

  • retrain the ngram model(caoli)
  • prepare the test and development set(caoli)
  • need discuss it with duxk

improve fuzzy match

  • add Synonyms similarity using MERT-4 method(hold)

improve lucene search

  • using MERT-4 method to get good value of multi-feature.like IDF,NER,baidu_weight,keyword etc.(liurong this month)

Multi-Scene Recognition

  • add the triples search to QA engine
  • demo (liurong two week)

.

  • new inter will install SEMPRE