“2014-10-27”版本间的差异

2014年10月28日 (二) 02:03的最后版本

Speech Processing

AM development

Contour

NAN problem

nan recurrence

  ------------------------------------------------------------
   grid/atr.  |   Reproducible  |    add.
  ------------------------------------------------------------
   grid-10    |     yes         |   
  ------------------------------------------------------------
   grid-12    |     no          | "nan" in different position
  ------------------------------------------------------------
   grid-14    |     yes         |  
  ------------------------------------------------------------

buy 760-GPU

Sparse DNN

Performance improvement found when pruned slightly
Experiments show that
Suggest to use TIMIT / AURORA 4 for training
HOLD

RNN AM

Initial nnet seems no very well, need to be pre-trained or test lower learn-rate.
For AURORA4 1h/epoch, 100 epochs done.
Using AURORA 4 short-sentence with a smaller number of targets.

Noise training

First draft of the noisy training journal paper.
Second version released.
Paper Correction (Yinshi, Liuchao, Lin Yiye), be going.

Drop out & Rectification & convolutive network

Drop out

dataset:wsj, testset:eval92

       std |  dropout0.4 | dropout0.5 | dropout0.6 | dropout0.7 | dropout0.8
    ------------------------------------------------------------------------- 
       4.5 |     5.39    |    4.80    |   4.75     |  4.36      |    4.55

- Frame-accuarcy seems not consistent with WER.
- Using the train-data as cv, verify the learning ability of the model.

AURORA4 dataset

  (1) Train: train_clean      
   drop-retention/testcase(WER) | test_clean_wv1  | test_airport_wv1 | test_babble_wv1 | test_car_wv1 
   ---------------------------------------------------------------------------------------------------------
          std-baseline          |  6.04           |  29.91           |  27.76          |  16.37
   ---------------------------------------------------------------------------------------------------------
             dp-0.4             |  6.61           |  29.59           |  30.12          |  19.40
   ---------------------------------------------------------------------------------------------------------
             dp-0.5             |  6.40           |  28.07           |  27.88          |  19.88
   ---------------------------------------------------------------------------------------------------------
             dp-0.6             |  6.36           |  26.68           |  24.85          |  18.32
   ---------------------------------------------------------------------------------------------------------
             dp-0.7             |  6.13           |  25.53           |  23.90          |  15.69
   ---------------------------------------------------------------------------------------------------------
             dp-0.8             |  5.94           |  24.94           |  23.67          |  15.77
   ---------------------------------------------------------------------------------------------------------
             dp-0.9             |  5.96           |  27.30           |  25.63          |  15.46
   ---------------------------------------------------------------------------------------------------------
 
  (2) Train: train_nosiy
   drop-retention/testcase(WER) | test_clean_wv1  | test_airport_wv1 | test_babble_wv1 | test_car_wv1 
   ---------------------------------------------------------------------------------------------------------
          std-baseline          |  9.60           |  11.41           |  11.63          |  8.64
   ---------------------------------------------------------------------------------------------------------
             dp-0.3             |  12.91          |  16.55           |  15.37          |  12.60
   ---------------------------------------------------------------------------------------------------------
             dp-0.4             |  11.48          |  14.43           |  13.23          |  11.04
   ---------------------------------------------------------------------------------------------------------
             dp-0.5             |  10.53          |  13.00           |  12.89          |  10.24
   ---------------------------------------------------------------------------------------------------------
             dp-0.6             |  10.02          |  12.32           |  11.81          |  9.29
   ---------------------------------------------------------------------------------------------------------
             dp-0.7             |  9.65           |  12.01           |  12.09          |  8.89
   ---------------------------------------------------------------------------------------------------------
             dp-0.8             |  9.79           |  12.01           |  11.77          |  8.91
   ---------------------------------------------------------------------------------------------------------
             dp-1.0             |  9.94           |  11.33           |  12.05          |  8.32
   ---------------------------------------------------------------------------------------------------------

- Losing important features, enlarge the hidden-layer dim to 2048.
- Follow the standard dnn training learn-rate to avoid the different learn-rate changing time of various DNN training.
- Test out of known noise test-data.
- Continue the droptout on normal trained XEnt NNET , eg wsj(learn-rate:1e-4/1e-5). (++)
- Draft the dropout-DNN weight distribution. (++)

Rectification

Still NAN error, need to debug.

 1) AURORA4 -15h
 (1) Train: train_clean
     learn-rate/testcase(WER)  | test_clean_wv1  | test_airport_wv1 | test_babble_wv1 | test_car_wv1 
   ---------------------------------------------------------------------------------------------------------
          std-baseline         |  6.04           |  29.91           |  27.76          |  16.37
   ---------------------------------------------------------------------------------------------------------
          lr0.001              |  6.28           |  30.01           |  30.26          |  20.81
   ---------------------------------------------------------------------------------------------------------
          lr0.003              |  6.44           |  32.01           |  32.24          |  17.82
   ---------------------------------------------------------------------------------------------------------
          lr0.005              |  6.47           |  33.49           |  34.75          |  18.15
   ---------------------------------------------------------------------------------------------------------
          lr0.007              |  6.72           |  35.85           |  39.72          |  18.03
   ---------------------------------------------------------------------------------------------------------
        lr-0.001_l1-0.001      |  83.19          |  98.57           |  98.84          |  97.77
   ---------------------------------------------------------------------------------------------------------
        lr-0.001_l1-0.0001     |  7.58           |  32.94           |  34.29          |  23.42
   ---------------------------------------------------------------------------------------------------------
       lr-0.001_l1-0.00001     |  6.21           |  29.15           |  28.24          |  19.50
   ---------------------------------------------------------------------------------------------------------
       lr-0.001_l1-0.000001    |  6.30           |  31.91           |  29.23          |  21.52
   ---------------------------------------------------------------------------------------------------------

Change the learn-rate in the middle of the training, Modify the train_nnet.sh script(Liu Chao).
Using maximum learning-rate.

MaxOut (++)

Convolutive network (+)

Test more configurations

Denoising & Farfield ASR

ICASSP paper submitted.
HOLD

VAD

Spike detection and removal.
Add more silence tag "#" in pure-silence utterance text(train).

xEntropy model be training
need to test baseline.

Sum all sil-pdf as the silence posterior probability.

Program done, to tune the threshold

rearrange the ending point of the detected speech

Speech rate training

Seems ROS model is superior to the normal one with faster speech
Suggest to extract speech data of different ROS, construct a new test set(+)
Tencent training data done

low resource language AM training

Use Chinese NN as initial NN, change the last layer

Various the used Chinese trained DNN layer numbers.
- feature_transform = 6000h_transform + 6000_N*hidden-layers

 nnet.init = random (4-N)*hidden-layers + output-layer
 | N / learn_rate | 0.008         | 0.001 | 0.0001 |
 |   baseline     | 17.00(14*2h)  |       |        |
 |       4        | 17.75(9*0.6h) | 18.64 |        |
 |       3        | 16.85         |       |        |
 |       2        | 16.69         |       |        |
 |       1        | 16.87         |       |        |
 |       0        | 16.88         |       |        |

- feature_transform = uyghur_transform + 6000_N*hidden-layers

 nnet.init = random (4-N)*hidden-layers + output-layer
 Note: This is reproduced Yinshi's experiment
 | N / learn_rate | 0.008 | 0.001 | 0.0001 |
 |   baseline     | 17.00 |       |        |
 |       4        | 28.23 | 30.72 | 37.32  |
 |       3        | 22.40 |       |        |
 |       2        | 19.76 |       |        |
 |       1        | 17.41 |       |        |
 |       0        |       |       |        |

- feature_transform = 6000_transform + 6000_N*hidden-layers

 nnet.init = uyghur (4-N)*hidden-layers + output-layer
 | N / learn_rate | 0.008 | 0.001 | 0.0001 |
 |   baseline     | 17.00 |       |        |
 |       4        | 17.80 | 18.55 | 21.06  |
 |       3        | 16.89 | 17.64 |        |
 |       2        |       |       |        |
 |       1        |       |       |        |
 |       0        |       |       |        |

sub word unit language model is ready. on testing.

Scoring

Harmonics program done, experiment to be done.
Initial experiment shows more timber data are required

Confidence

Reproduce the experiments on fisher dataset.
Use the fisher DNN model to decode all-wsj dataset
preparing scoring for puqiang data

Speaker ID

Preparing GMM-based server.
EER ~ 11.2% (GMM-based system)
test different number of components; fast i-vector computing

Emotion detection

Sinovoice is implementing the server

Text Processing

LM development

Domain specific LM

domain lm

am:1400h(2.0.b) .result: xiaomi-29.43%,baiduzhidao-43.46%,baiduHi-30.02%, test-set:8ksentence(16k=>8k)
need to check the xiaomin-lm method and result.

new dict.

weibo-data : Tencent-segment and count. get 16k words to segment again.
new toolkit:find method to update the new dict. can get new wordlist from sougou and get word information from baidu.

tag LM

set new test

1k address from dianxin. prepare to test.
insert the new unknown-address to test set.
record test set 15-sentence/person on dianxin txt.

RNN LM

rnn

RNNLM=>ALPA
train RNNLM on Chinese data from jietong-data

lstm+rnn

wer:6.2%(4-epoch).need to check the problem.

Word2Vector

W2V based doc classification

Initial results variable Bayesian GMM obtained. Performance is not as good as the conventional GMM.
Non-linear inter-language transform: English-Spanish-Czch: wv model training done, transform model on investigation

SSA-based local linear mapping still on running.
k-means classes change to 2.

Knowledge vector started

format the data
yuanbin will continue this work with help of xingchao.

Character to word conversion

prepare the task: word similarity
prepare the dict.

Google word vector train

some ideal will discuss on weekly report.

Translation

v3.0 demo released

still slow
re-segment the word using new dictionary.will use the tencent-dic about 11w.
check new data.

QA

search method:

test the lucene method
analysis the test result
add IDF to test

spell check

get ngram tool and make a simple demo.
get domain word list and pingyin tool from huilan.

new inter will install SEMPRE

“2014-10-27”版本间的差异

2014年10月28日 (二) 02:03的最后版本

目录

Speech Processing

AM development

Contour

Sparse DNN

RNN AM

Noise training

Drop out & Rectification & convolutive network

Denoising & Farfield ASR

VAD

Speech rate training

low resource language AM training

Scoring

Confidence

Speaker ID

Emotion detection

Text Processing

LM development

Domain specific LM

tag LM

RNN LM

Word2Vector

W2V based doc classification

Translation

QA

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具

@@ 第14行： / 第14行： @@
      grid-14    |     yes         |
     ------------------------------------------------------------
-:* buy 760
+:* buy 760-GPU
 ==== Sparse DNN ====
@@ 第39行： / 第39行： @@
       -------------------------------------------------------------------------
 .5 |     5.39    |    4.80    |   4.75     |  4.36      |    4.55
-:* Frame-accuarcy seems not consistent with WER.
+:** Frame-accuarcy seems not consistent with WER.
-:* Using the train-data as cv, verify the learning ability of the model.
+:** Using the train-data as cv, verify the learning ability of the model.
 :* AURORA4 dataset
     (1) Train: train_clean
      drop-retention/testcase(WER) | test_clean_wv1  | test_airport_wv1 | test_babble_wv1 | test_car_wv1
      ---------------------------------------------------------------------------------------------------------
             std-baseline          |  6.04           |  29.91           |  27.76          |  16.37
      ---------------------------------------------------------------------------------------------------------
-	      dp-0.4             |  6.61           |  29.59           |  30.12          |  19.40
+              dp-0.4             |  6.61           |  29.59           |  30.12          |  19.40
      ---------------------------------------------------------------------------------------------------------
-	      dp-0.5             |  6.40           |  28.07           |  27.88          |  19.88
+              dp-0.5             |  6.40           |  28.07           |  27.88          |  19.88
      ---------------------------------------------------------------------------------------------------------
                dp-0.6             |  6.36           |  26.68           |  24.85          |  18.32
      ---------------------------------------------------------------------------------------------------------
-	      dp-0.7             |  6.13           |  25.53           |  23.90          |  15.69
+              dp-0.7             |  6.13           |  25.53           |  23.90          |  15.69
      ---------------------------------------------------------------------------------------------------------
                dp-0.8             |  5.94           |  24.94           |  23.67          |  15.77
@@ 第67行： / 第66行： @@
             std-baseline          |  9.60           |  11.41           |  11.63          |  8.64
      ---------------------------------------------------------------------------------------------------------
-	      dp-0.3             |  12.91          |  16.55           |  15.37          |  12.60
+              dp-0.3             |  12.91          |  16.55           |  15.37          |  12.60
      ---------------------------------------------------------------------------------------------------------
-	      dp-0.4             |  11.48          |  14.43           |  13.23          |  11.04
+              dp-0.4             |  11.48          |  14.43           |  13.23          |  11.04
      ---------------------------------------------------------------------------------------------------------
-	      dp-0.5             |  10.53          |  13.00           |  12.89          |  10.24
+              dp-0.5             |  10.53          |  13.00           |  12.89          |  10.24
      ---------------------------------------------------------------------------------------------------------
                dp-0.6             |  10.02          |  12.32           |  11.81          |  9.29
      ---------------------------------------------------------------------------------------------------------
-	      dp-0.7             |  9.65           |  12.01           |  12.09          |  8.89
+              dp-0.7             |  9.65           |  12.01           |  12.09          |  8.89
      ---------------------------------------------------------------------------------------------------------
-	      dp-0.8             |  9.79           |  12.01           |  11.77          |  8.91
+              dp-0.8             |  9.79           |  12.01           |  11.77          |  8.91
      ---------------------------------------------------------------------------------------------------------
                dp-1.0             |  9.94           |  11.33           |  12.05          |  8.32
      ---------------------------------------------------------------------------------------------------------
-:* Losing important features, enlarge the hidden-layer dim to 2048.
+:** Losing important features, enlarge the hidden-layer dim to 2048.
-:* Follow the standard dnn training learn-rate to avoid the different learn-rate changing time of various DNN training.
+:** Follow the standard dnn training learn-rate to avoid the different learn-rate changing time of various DNN training.
-:* Test out of known noise test-data.
+:** Test out of known noise test-data.
-:* Continue the droptout on normal trained XEnt NNET , eg wsj(learn-rate:1e-4/1e-5). (++)
+:** Continue the droptout on normal trained XEnt NNET , eg wsj(learn-rate:1e-4/1e-5). (++)
-:* Draft the dropout-DNN weight distribution. (++)
+:** Draft the dropout-DNN weight distribution. (++)
 * Rectification
@@ 第95行： / 第94行： @@
             std-baseline         |  6.04           |  29.91           |  27.76          |  16.37
      ---------------------------------------------------------------------------------------------------------
-	   lr0.001              |  6.28           |  30.01           |  30.26          |  20.81
+           lr0.001              |  6.28           |  30.01           |  30.26          |  20.81
      ---------------------------------------------------------------------------------------------------------
             lr0.003              |  6.44           |  32.01           |  32.24          |  17.82
      ---------------------------------------------------------------------------------------------------------
-	   lr0.005              |  6.47           |  33.49           |  34.75          |  18.15
+           lr0.005              |  6.47           |  33.49           |  34.75          |  18.15
      ---------------------------------------------------------------------------------------------------------
-	   lr0.007              |  6.72           |  35.85           |  39.72          |  18.03
+           lr0.007              |  6.72           |  35.85           |  39.72          |  18.03
      ---------------------------------------------------------------------------------------------------------
           lr-0.001_l1-0.001      |  83.19          |  98.57           |  98.84          |  97.77
      ---------------------------------------------------------------------------------------------------------
-	 lr-0.001_l1-0.0001     |  7.58           |  32.94           |  34.29          |  23.42
+         lr-0.001_l1-0.0001     |  7.58           |  32.94           |  34.29          |  23.42
      ---------------------------------------------------------------------------------------------------------
-	lr-0.001_l1-0.00001     |  6.21           |  29.15           |  28.24          |  19.50
+        lr-0.001_l1-0.00001     |  6.21           |  29.15           |  28.24          |  19.50
      ---------------------------------------------------------------------------------------------------------
-	lr-0.001_l1-0.000001    |  6.30           |  31.91           |  29.23          |  21.52
+        lr-0.001_l1-0.000001    |  6.30           |  31.91           |  29.23          |  21.52
      ---------------------------------------------------------------------------------------------------------
 :* Change the learn-rate in the middle of the training, Modify the train_nnet.sh script(Liu Chao).
@@ 第130行： / 第129行： @@
 :* xEntropy model be training
 :* need to test baseline.
 * Sum all sil-pdf as the silence posterior probability.
 :* Program done, to tune the threshold
+* rearrange the ending point of the detected speech
 ====Speech rate training====
 * Seems ROS model is superior to the normal one with faster speech
 * Suggest to extract speech data of different ROS, construct a new test set(+)
-* Suggest to use Tencent training data(+)
+* Tencent training data done
 ==== low resource language AM training ====
@@ 第172行： / 第171行： @@
    |       0        |       |       |        |
+* sub word unit language model is ready. on testing.
 ====Scoring====
-* global scoring done.
-* Pitch & rhythm done, need testing
 * Harmonics program done, experiment to be done.
+* Initial experiment shows more timber data are required
 ====Confidence====
 * Reproduce the experiments on fisher dataset.
 * Use the fisher DNN model to decode all-wsj dataset
+* preparing scoring for puqiang data
 ===Speaker ID===
 * Preparing GMM-based server.
+* EER ~ 11.2% (GMM-based system)
+* test different number of components; fast i-vector computing
 ===Emotion detection===