“2014-10-27”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
(以“==Speech Processing == === AM development === ==== Contour ==== * NAN problem :* nan recurrence ------------------------------------------------------------...”为内容创建页面)
 
 
(3位用户的13个中间修订版本未显示)
第14行: 第14行:
 
     grid-14    |    yes        |   
 
     grid-14    |    yes        |   
 
   ------------------------------------------------------------
 
   ------------------------------------------------------------
 +
:* buy 760-GPU
  
 
==== Sparse DNN ====
 
==== Sparse DNN ====
第19行: 第20行:
 
* Experiments show that  
 
* Experiments show that  
 
* Suggest to use TIMIT / AURORA 4 for training
 
* Suggest to use TIMIT / AURORA 4 for training
 +
* HOLD
  
 
==== RNN AM====
 
==== RNN AM====
* Initial test on WSJ , leads to out-memory.
+
* Initial nnet seems no very well, need to be pre-trained or test lower learn-rate.
 +
* For AURORA4 1h/epoch, 100 epochs done.
 
* Using AURORA 4 short-sentence with a smaller number of targets.
 
* Using AURORA 4 short-sentence with a smaller number of targets.
  
 
====Noise training====
 
====Noise training====
* First draft of the noisy training journal paper  
+
* First draft of the noisy training journal paper.
 +
* Second version released.
 
* Paper Correction (Yinshi, Liuchao, Lin Yiye), be going.
 
* Paper Correction (Yinshi, Liuchao, Lin Yiye), be going.
  
第32行: 第36行:
 
* Drop out
 
* Drop out
 
:* dataset:wsj, testset:eval92
 
:* dataset:wsj, testset:eval92
         std |  dropout0.4 | dropout0.5 | dropout0.7 | dropout0.8
+
         std |  dropout0.4 | dropout0.5 | dropout0.6 | dropout0.7 | dropout0.8
     -------------------------------------------------------------  
+
     -------------------------------------------------------------------------  
         4.5 |    5.39    |    4.80    |  4.36     |    -    
+
         4.5 |    5.39    |    4.80    |  4.75     |  4.36      |    4.55 
 +
:** Frame-accuarcy seems not consistent with WER.
 +
:** Using the train-data as cv, verify the learning ability of the model.   
  
:* Test on noisy AURORA4 dataset
+
:* AURORA4 dataset
        std |  dropout0.4 | dropout0.5 | dropout0.7 | dropout0.8
+
  (1) Train: train_clean     
    -------------------------------------------------------------  
+
    drop-retention/testcase(WER) | test_clean_wv1  | test_airport_wv1 | test_babble_wv1 | test_car_wv1
      6.05 |    -       |   -       |   -       |   -
+
    ---------------------------------------------------------------------------------------------------------
:* Continue the droptout on normal trained XEnt NNET , eg wsj. (+)
+
          std-baseline          6.04          |  29.91          |  27.76          |  16.37
:* Draft the dropout-DNN weight distribution. (+)
+
    ---------------------------------------------------------------------------------------------------------
 +
              dp-0.4             | 6.61          |  29.59          |  30.12          |  19.40
 +
    ---------------------------------------------------------------------------------------------------------
 +
              dp-0.5             | 6.40          |  28.07          |  27.88          |  19.88
 +
    ---------------------------------------------------------------------------------------------------------
 +
              dp-0.6            |  6.36          |  26.68          |  24.85          |  18.32
 +
    ---------------------------------------------------------------------------------------------------------
 +
              dp-0.7             | 6.13          |  25.53          |  23.90          |  15.69
 +
    ---------------------------------------------------------------------------------------------------------
 +
              dp-0.8             |  5.94          |  24.94          |  23.67          |  15.77
 +
    ---------------------------------------------------------------------------------------------------------
 +
              dp-0.9            |  5.96          |  27.30          |  25.63          |  15.46
 +
    ---------------------------------------------------------------------------------------------------------
 +
 
 +
  (2) Train: train_nosiy
 +
    drop-retention/testcase(WER) | test_clean_wv1  | test_airport_wv1 | test_babble_wv1 | test_car_wv1
 +
    ---------------------------------------------------------------------------------------------------------
 +
          std-baseline          |  9.60          |  11.41          |  11.63          |  8.64
 +
    ---------------------------------------------------------------------------------------------------------
 +
              dp-0.3            |  12.91          |  16.55          |  15.37          |  12.60
 +
    ---------------------------------------------------------------------------------------------------------
 +
              dp-0.4            |  11.48          |  14.43          |  13.23          |  11.04
 +
    ---------------------------------------------------------------------------------------------------------
 +
              dp-0.5            |  10.53          |  13.00          |  12.89          |  10.24
 +
    ---------------------------------------------------------------------------------------------------------
 +
              dp-0.6             |  10.02          | 12.32          |  11.81          |  9.29
 +
     ---------------------------------------------------------------------------------------------------------
 +
              dp-0.7            | 9.65          |  12.01          |  12.09          |  8.89
 +
    ---------------------------------------------------------------------------------------------------------
 +
              dp-0.8            | 9.79          |  12.01          |  11.77          |  8.91
 +
    ---------------------------------------------------------------------------------------------------------
 +
              dp-1.0            | 9.94          |  11.33          |  12.05          |  8.32
 +
    ---------------------------------------------------------------------------------------------------------
 +
:** Losing important features, enlarge the hidden-layer dim to 2048.
 +
:** Follow the standard dnn training learn-rate to avoid the different learn-rate changing time of various DNN training.
 +
:** Test out of known noise test-data.
 +
:** Continue the droptout on normal trained XEnt NNET , eg wsj(learn-rate:1e-4/1e-5). (++)
 +
:** Draft the dropout-DNN weight distribution. (++)
  
 
* Rectification
 
* Rectification
:* Still NAN error, need to debug. (+)
+
:* Still NAN error, need to debug.  
 +
  1) AURORA4 -15h
 +
  (1) Train: train_clean
 +
      learn-rate/testcase(WER)  | test_clean_wv1  | test_airport_wv1 | test_babble_wv1 | test_car_wv1
 +
    ---------------------------------------------------------------------------------------------------------
 +
          std-baseline        |  6.04          |  29.91          |  27.76          |  16.37
 +
    ---------------------------------------------------------------------------------------------------------
 +
          lr0.001              |  6.28          |  30.01          |  30.26          |  20.81
 +
    ---------------------------------------------------------------------------------------------------------
 +
          lr0.003              |  6.44          |  32.01          |  32.24          |  17.82
 +
    ---------------------------------------------------------------------------------------------------------
 +
          lr0.005              |  6.47          |  33.49          |  34.75          |  18.15
 +
    ---------------------------------------------------------------------------------------------------------
 +
          lr0.007              |  6.72          |  35.85          |  39.72          |  18.03
 +
    ---------------------------------------------------------------------------------------------------------
 +
        lr-0.001_l1-0.001      |  83.19          |  98.57          |  98.84          |  97.77
 +
    ---------------------------------------------------------------------------------------------------------
 +
        lr-0.001_l1-0.0001    |  7.58          |  32.94          |  34.29          |  23.42
 +
    ---------------------------------------------------------------------------------------------------------
 +
        lr-0.001_l1-0.00001    |  6.21          |  29.15          |  28.24          |  19.50
 +
    ---------------------------------------------------------------------------------------------------------
 +
        lr-0.001_l1-0.000001    |  6.30          |  31.91          |  29.23          |  21.52
 +
    ---------------------------------------------------------------------------------------------------------
 +
:* Change the learn-rate in the middle of the training, Modify the train_nnet.sh script(Liu Chao).
 +
:* Using maximum learning-rate.
  
* MaxOut (+)
+
* MaxOut (++)
  
* Convolutive network
+
* Convolutive network (+)
:*Test more configurations  
+
:* Test more configurations  
:* Yiye will work on CNN
+
:* Reading CNN tutorial
+
  
 
====Denoising & Farfield ASR====
 
====Denoising & Farfield ASR====
  
 
* ICASSP paper submitted.
 
* ICASSP paper submitted.
 +
* HOLD
  
 
====VAD====
 
====VAD====
  
 +
* Spike detection and removal.
 
* Add more silence tag "#" in pure-silence utterance text(train).
 
* Add more silence tag "#" in pure-silence utterance text(train).
 
:* xEntropy model be training
 
:* xEntropy model be training
 
:* need to test baseline.
 
:* need to test baseline.
 
 
* Sum all sil-pdf as the silence posterior probability.
 
* Sum all sil-pdf as the silence posterior probability.
 
:* Program done, to tune the threshold
 
:* Program done, to tune the threshold
 +
* rearrange the ending point of the detected speech
  
 
====Speech rate training====
 
====Speech rate training====
 
* Seems ROS model is superior to the normal one with faster speech
 
* Seems ROS model is superior to the normal one with faster speech
 
* Suggest to extract speech data of different ROS, construct a new test set(+)
 
* Suggest to extract speech data of different ROS, construct a new test set(+)
* Suggest to use Tencent training data(+)
+
* Tencent training data done
  
 
==== low resource language AM training ====
 
==== low resource language AM training ====
第104行: 第171行:
 
   |      0        |      |      |        |
 
   |      0        |      |      |        |
  
 +
 +
* sub word unit language model is ready. on testing.
 
====Scoring====
 
====Scoring====
* global scoring done.
 
* Pitch & rhythm done, need testing
 
 
* Harmonics program done, experiment to be done.
 
* Harmonics program done, experiment to be done.
 +
* Initial experiment shows more timber data are required
  
 
====Confidence====
 
====Confidence====
 
* Reproduce the experiments on fisher dataset.
 
* Reproduce the experiments on fisher dataset.
 
* Use the fisher DNN model to decode all-wsj dataset
 
* Use the fisher DNN model to decode all-wsj dataset
 
+
* preparing scoring for puqiang data
  
 
===Speaker ID===
 
===Speaker ID===
 
* Preparing GMM-based server.
 
* Preparing GMM-based server.
 +
* EER ~ 11.2% (GMM-based system)
 +
* test different number of components; fast i-vector computing
  
 
===Emotion detection===
 
===Emotion detection===
第127行: 第197行:
  
 
====Domain specific LM====
 
====Domain specific LM====
* lm based on baidu_hi and baiduzhidao is done, test on shujutang test set.
+
* domain lm
* weibo lm were training with pruning on counts(5,10,10,20,20),because it is too large. the ppl is twice as high than baidu_hi && baidu_zhidao.
+
:* am:1400h(2.0.b) .result: xiaomi-29.43%,baiduzhidao-43.46%,baiduHi-30.02%, test-set:8ksentence(16k=>8k)
 +
:* need to check the xiaomin-lm method and result.
  
* dongxu get good vocabulary from big data. Train 5-gram LM using Baiduzhidao_corpus(~30GB after preprocess) with new lexicon. There is a mistake when counted possiblity after merge.
+
* new dict.
 +
:* weibo-data : Tencent-segment and count. get 16k words to segment again.
 +
:* new toolkit:find method to update the new dict. can get new wordlist from sougou and get word information from baidu.
  
===tag LM===
+
====tag LM====
* use HIT's LTP tool to segment,pos and ner. the program is running(about 3 days) on baiduHi and baiduzhidao(total 365G)
+
 
* will use the small test set from xiaoxi for address-tag..
+
* set new test
* now about more 1M address,will prune it using frequency.
+
:* 1k address from dianxin. prepare to test.  
 +
:* insert the new unknown-address to test set.
 +
:* record test set 15-sentence/person on dianxin txt.
 +
 
 +
====RNN LM====
 +
*rnn
 +
:* RNNLM=>ALPA
 +
:* train RNNLM on Chinese data from jietong-data
 +
*lstm+rnn
 +
:* wer:6.2%(4-epoch).need to check the problem.
  
 
===Word2Vector===
 
===Word2Vector===
第156行: 第238行:
 
* Google word vector train
 
* Google word vector train
 
:* some ideal will discuss on weekly report.
 
:* some ideal will discuss on weekly report.
===RNN LM===
+
 
*rnn
+
:* get baseline on nbest rescore of wer.
+
*lstm+rnn
+
:* trained the RNN+LSTM lm on wsj_np_data about 200M. the neural net work is 100*100(lstm cell)*10000 with 100 classes. it cost about 200 minutes each epoch on 2 cpu kernels.
+
:* get baseline on nbest rescore of wer.
+
:* more detail on LSTM 
+
 
===Translation===
 
===Translation===
  
第173行: 第249行:
  
 
* search method:
 
* search method:
:* add the vsm and BM25 to improve the search. and the strategy of selecting the answer.
+
:* test the lucene method
 +
:* analysis the test result
 +
:* add IDF to test
 
* spell check
 
* spell check
 
:* get ngram tool and make a simple demo.
 
:* get ngram tool and make a simple demo.
 
:* get domain word list and pingyin tool from huilan.
 
:* get domain word list and pingyin tool from huilan.
** new inter will install SEMPRE
+
* new inter will install SEMPRE

2014年10月28日 (二) 02:03的最后版本

Speech Processing

AM development

Contour

  • NAN problem
  • nan recurrence
  ------------------------------------------------------------
   grid/atr.  |   Reproducible  |    add.
  ------------------------------------------------------------
   grid-10    |     yes         |   
  ------------------------------------------------------------
   grid-12    |     no          | "nan" in different position
  ------------------------------------------------------------
   grid-14    |     yes         |  
  ------------------------------------------------------------
  • buy 760-GPU

Sparse DNN

  • Performance improvement found when pruned slightly
  • Experiments show that
  • Suggest to use TIMIT / AURORA 4 for training
  • HOLD

RNN AM

  • Initial nnet seems no very well, need to be pre-trained or test lower learn-rate.
  • For AURORA4 1h/epoch, 100 epochs done.
  • Using AURORA 4 short-sentence with a smaller number of targets.

Noise training

  • First draft of the noisy training journal paper.
  • Second version released.
  • Paper Correction (Yinshi, Liuchao, Lin Yiye), be going.

Drop out & Rectification & convolutive network

  • Drop out
  • dataset:wsj, testset:eval92
       std |  dropout0.4 | dropout0.5 | dropout0.6 | dropout0.7 | dropout0.8
    ------------------------------------------------------------------------- 
       4.5 |     5.39    |    4.80    |   4.75     |  4.36      |    4.55  
    • Frame-accuarcy seems not consistent with WER.
    • Using the train-data as cv, verify the learning ability of the model.
  • AURORA4 dataset
  (1) Train: train_clean      
   drop-retention/testcase(WER) | test_clean_wv1  | test_airport_wv1 | test_babble_wv1 | test_car_wv1 
   ---------------------------------------------------------------------------------------------------------
          std-baseline          |  6.04           |  29.91           |  27.76          |  16.37
   ---------------------------------------------------------------------------------------------------------
             dp-0.4             |  6.61           |  29.59           |  30.12          |  19.40
   ---------------------------------------------------------------------------------------------------------
             dp-0.5             |  6.40           |  28.07           |  27.88          |  19.88
   ---------------------------------------------------------------------------------------------------------
             dp-0.6             |  6.36           |  26.68           |  24.85          |  18.32
   ---------------------------------------------------------------------------------------------------------
             dp-0.7             |  6.13           |  25.53           |  23.90          |  15.69
   ---------------------------------------------------------------------------------------------------------
             dp-0.8             |  5.94           |  24.94           |  23.67          |  15.77
   ---------------------------------------------------------------------------------------------------------
             dp-0.9             |  5.96           |  27.30           |  25.63          |  15.46
   ---------------------------------------------------------------------------------------------------------
 
  (2) Train: train_nosiy
   drop-retention/testcase(WER) | test_clean_wv1  | test_airport_wv1 | test_babble_wv1 | test_car_wv1 
   ---------------------------------------------------------------------------------------------------------
          std-baseline          |  9.60           |  11.41           |  11.63          |  8.64
   ---------------------------------------------------------------------------------------------------------
             dp-0.3             |  12.91          |  16.55           |  15.37          |  12.60
   ---------------------------------------------------------------------------------------------------------
             dp-0.4             |  11.48          |  14.43           |  13.23          |  11.04
   ---------------------------------------------------------------------------------------------------------
             dp-0.5             |  10.53          |  13.00           |  12.89          |  10.24
   ---------------------------------------------------------------------------------------------------------
             dp-0.6             |  10.02          |  12.32           |  11.81          |  9.29
   ---------------------------------------------------------------------------------------------------------
             dp-0.7             |  9.65           |  12.01           |  12.09          |  8.89
   ---------------------------------------------------------------------------------------------------------
             dp-0.8             |  9.79           |  12.01           |  11.77          |  8.91
   ---------------------------------------------------------------------------------------------------------
             dp-1.0             |  9.94           |  11.33           |  12.05          |  8.32
   ---------------------------------------------------------------------------------------------------------
    • Losing important features, enlarge the hidden-layer dim to 2048.
    • Follow the standard dnn training learn-rate to avoid the different learn-rate changing time of various DNN training.
    • Test out of known noise test-data.
    • Continue the droptout on normal trained XEnt NNET , eg wsj(learn-rate:1e-4/1e-5). (++)
    • Draft the dropout-DNN weight distribution. (++)
  • Rectification
  • Still NAN error, need to debug.
 1) AURORA4 -15h
 (1) Train: train_clean
     learn-rate/testcase(WER)  | test_clean_wv1  | test_airport_wv1 | test_babble_wv1 | test_car_wv1 
   ---------------------------------------------------------------------------------------------------------
          std-baseline         |  6.04           |  29.91           |  27.76          |  16.37
   ---------------------------------------------------------------------------------------------------------
          lr0.001              |  6.28           |  30.01           |  30.26          |  20.81
   ---------------------------------------------------------------------------------------------------------
          lr0.003              |  6.44           |  32.01           |  32.24          |  17.82
   ---------------------------------------------------------------------------------------------------------
          lr0.005              |  6.47           |  33.49           |  34.75          |  18.15
   ---------------------------------------------------------------------------------------------------------
          lr0.007              |  6.72           |  35.85           |  39.72          |  18.03
   ---------------------------------------------------------------------------------------------------------
        lr-0.001_l1-0.001      |  83.19          |  98.57           |  98.84          |  97.77
   ---------------------------------------------------------------------------------------------------------
        lr-0.001_l1-0.0001     |  7.58           |  32.94           |  34.29          |  23.42
   ---------------------------------------------------------------------------------------------------------
       lr-0.001_l1-0.00001     |  6.21           |  29.15           |  28.24          |  19.50
   ---------------------------------------------------------------------------------------------------------
       lr-0.001_l1-0.000001    |  6.30           |  31.91           |  29.23          |  21.52
   ---------------------------------------------------------------------------------------------------------
  • Change the learn-rate in the middle of the training, Modify the train_nnet.sh script(Liu Chao).
  • Using maximum learning-rate.
  • MaxOut (++)
  • Convolutive network (+)
  • Test more configurations

Denoising & Farfield ASR

  • ICASSP paper submitted.
  • HOLD

VAD

  • Spike detection and removal.
  • Add more silence tag "#" in pure-silence utterance text(train).
  • xEntropy model be training
  • need to test baseline.
  • Sum all sil-pdf as the silence posterior probability.
  • Program done, to tune the threshold
  • rearrange the ending point of the detected speech

Speech rate training

  • Seems ROS model is superior to the normal one with faster speech
  • Suggest to extract speech data of different ROS, construct a new test set(+)
  • Tencent training data done

low resource language AM training

  • Use Chinese NN as initial NN, change the last layer
  • Various the used Chinese trained DNN layer numbers.
    • feature_transform = 6000h_transform + 6000_N*hidden-layers
 nnet.init = random (4-N)*hidden-layers + output-layer
 | N / learn_rate | 0.008         | 0.001 | 0.0001 |
 |   baseline     | 17.00(14*2h)  |       |        |
 |       4        | 17.75(9*0.6h) | 18.64 |        |
 |       3        | 16.85         |       |        |
 |       2        | 16.69         |       |        |
 |       1        | 16.87         |       |        |
 |       0        | 16.88         |       |        |  
    • feature_transform = uyghur_transform + 6000_N*hidden-layers
 nnet.init = random (4-N)*hidden-layers + output-layer
 Note: This is reproduced Yinshi's experiment
 | N / learn_rate | 0.008 | 0.001 | 0.0001 |
 |   baseline     | 17.00 |       |        |
 |       4        | 28.23 | 30.72 | 37.32  |
 |       3        | 22.40 |       |        |
 |       2        | 19.76 |       |        |
 |       1        | 17.41 |       |        |
 |       0        |       |       |        |
    • feature_transform = 6000_transform + 6000_N*hidden-layers
 nnet.init = uyghur (4-N)*hidden-layers + output-layer
 | N / learn_rate | 0.008 | 0.001 | 0.0001 |
 |   baseline     | 17.00 |       |        |
 |       4        | 17.80 | 18.55 | 21.06  |
 |       3        | 16.89 | 17.64 |        |
 |       2        |       |       |        |
 |       1        |       |       |        |
 |       0        |       |       |        |


  • sub word unit language model is ready. on testing.

Scoring

  • Harmonics program done, experiment to be done.
  • Initial experiment shows more timber data are required

Confidence

  • Reproduce the experiments on fisher dataset.
  • Use the fisher DNN model to decode all-wsj dataset
  • preparing scoring for puqiang data

Speaker ID

  • Preparing GMM-based server.
  • EER ~ 11.2% (GMM-based system)
  • test different number of components; fast i-vector computing

Emotion detection

  • Sinovoice is implementing the server


Text Processing

LM development

Domain specific LM

  • domain lm
  • am:1400h(2.0.b) .result: xiaomi-29.43%,baiduzhidao-43.46%,baiduHi-30.02%, test-set:8ksentence(16k=>8k)
  • need to check the xiaomin-lm method and result.
  • new dict.
  • weibo-data : Tencent-segment and count. get 16k words to segment again.
  • new toolkit:find method to update the new dict. can get new wordlist from sougou and get word information from baidu.

tag LM

  • set new test
  • 1k address from dianxin. prepare to test.
  • insert the new unknown-address to test set.
  • record test set 15-sentence/person on dianxin txt.

RNN LM

  • rnn
  • RNNLM=>ALPA
  • train RNNLM on Chinese data from jietong-data
  • lstm+rnn
  • wer:6.2%(4-epoch).need to check the problem.

Word2Vector

W2V based doc classification

  • Initial results variable Bayesian GMM obtained. Performance is not as good as the conventional GMM.
  • Non-linear inter-language transform: English-Spanish-Czch: wv model training done, transform model on investigation
  • SSA-based local linear mapping still on running.
  • k-means classes change to 2.
  • Knowledge vector started
  • format the data
  • yuanbin will continue this work with help of xingchao.
  • Character to word conversion
  • prepare the task: word similarity
  • prepare the dict.
  • Google word vector train
  • some ideal will discuss on weekly report.

Translation

  • v3.0 demo released
  • still slow
  • re-segment the word using new dictionary.will use the tencent-dic about 11w.
  • check new data.

QA

  • search method:
  • test the lucene method
  • analysis the test result
  • add IDF to test
  • spell check
  • get ngram tool and make a simple demo.
  • get domain word list and pingyin tool from huilan.
  • new inter will install SEMPRE