“2014-10-13”版本间的差异

2014年10月20日 (一) 06:31的最后版本

Speech Processing

AM development

Sparse DNN

Performance improvement found when pruned slightly
Experiments show that
Suggest to use TIMIT / AURORA 4 for training

RNN AM

Initial test on WSJ , leads to out-memory.
Using AURORA 4 short-sentence with a smaller number of targets.

Noise training

First draft of the noisy training journal paper
Paper Correction (Yinshi, Liuchao, Lin Yiye), be going.

Drop out & Rectification & convolutive network

Drop out

dataset:wsj, testset:eval92

       std |  dropout0.2 | dropout0.4 | dropout0.6 | dropout0.8
    ------------------------------------------------------------- 
      4.5  |     4.54    |    4.5     |   4.25     |    4.5

Test on noisy AURORA 4 dataset
Continue the droptout on normal trained XEnt NNET , eg wsj.
Draft the dropout-DNN weight distribution.

Rectification

Still NAN error, need to debug.

MaxOut

Convolutive network

Test more configurations
Yiye will work on CNN

Denoising & Farfield ASR

ICASSP paper submitted.

VAD

Add more silence tag "#" in pure-silence utterance text(train).

xEntropy model be training

Sum all sil-pdf as the silence posterior probability.

Speech rate training

Seems ROS model is superior to the normal one with faster speech
Need to check distribution of ROS on WSJ
Suggest to extract speech data of different ROS, construct a new test set
Suggest to use Tencent training data
Suggest to remove silence when compute ROS

low resource language AM training

Use Chinese NN as initial NN, change the last layer

Various the used Chinese trained DNN layer numbers.

Scoring

global scoring done.
Pitch & rhythm done, need testing
Harmonics hold

Confidence

Reproduce the experiments on fisher dataset.
Use the fisher DNN model to decode all-wsj dataset

Speaker ID

Preparing GMM-based server.

Emotion detection

Sinovoice is implementing the server

Text Processing

LM development

Domain specific LM

h2. ngram generation is on going h2. look the memory and baidu_hi done

h2. NUM tag LM:

maxi work is released.
yuanbin continue the tag lm work.
add the ner to tag lm .
Boost specific words like wifi if TAG model does not work for a particular word.

Word2Vector

W2V based doc classification

Initial results variable Bayesian GMM obtained. Performance is not as good as the conventional GMM.
Non-linear inter-language transform: English-Spanish-Czch: wv model training done, transform model on investigation

SSA-based local linear mapping still on running.
k-means classes change to 2.

Knowledge vector started

format the data

Character to word conversion

prepare the task: word similarity
prepare the dict.

Google word vector train

improve the sampling method

RNN LM

rnn
lstm+rnn

install the tool and prepare the data of wsj

prepare the baseline.

Translation

v3.0 demo released

still slow
re-segment the word using new dictionary.
check new data.

QA

search method:

add the vsm and BM25 to improve the search. and the strategy of selecting the answer
segment the word using minimum granularity for lucene index and bag-of-words method.

new inter will install SEMPRE

@@ 第6行： / 第6行： @@
 * Experiments show that
 * Suggest to use TIMIT / AURORA 4 for training
+==== RNN AM====
+* Initial test on WSJ , leads to out-memory.
+* Using AURORA 4 short-sentence with a smaller number of targets.
 ====Noise training====
-*
+* First draft of the noisy training journal paper
-:* First draft of the noisy training journal paper
+* Paper Correction (Yinshi, Liuchao, Lin Yiye), be going.
-:* Check abnormal behavior with large sigma (Yinshi, Liuchao)
 ====Drop out & Rectification & convolutive network====
 * Drop out
+:* dataset:wsj, testset:eval92
+        std |  dropout0.2 | dropout0.4 | dropout0.6 | dropout0.8
+     -------------------------------------------------------------
+.5  |     4.54    |    4.5     |   4.25     |    4.5
-:* No performance improvement found yet.
+:* Test on noisy AURORA 4 dataset
-:* [http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?account=wangd&step=view_request&cvssid=261]
+:* Continue the droptout on normal trained XEnt NNET , eg wsj.
+:* Draft the dropout-DNN weight distribution.
 * Rectification
-:* Dropout NA problem was caused by large magnitude of weights
+:* Still NAN error, need to debug.
+* MaxOut
 * Convolutive network
-# Test more configurations
+:*Test more configurations
-* Zhiyong will work on CNN
+:* Yiye will work on CNN
-* Recurrent neural network
-:* investigate CURRENNT for AM
 ====Denoising & Farfield ASR====
-*
+* ICASSP paper submitted.
-* Lasso-based de-reverberation is done with the REVERBERATION toolkit
-:* Start to compose the experiment section for the SL paper.
 ====VAD====
-* problems found at the beginning part of speech (0-0.02s?)
+* Add more silence tag "#" in pure-silence utterance text(train).
-* Noise model training done. Under testing.
+:* xEntropy model be training
-* Need to investigate the performance reduction in babble noise. Call Jia.
+* Sum all sil-pdf as the silence posterior probability.
 ====Speech rate training====
 *
-* Some interesting results with the simple speech rate change algorithm was obtained on the WSJ db
-[http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?account=wangd&step=view_request&cvssid=268]
 * Seems ROS model is superior to the normal one with faster speech
 * Need to check distribution of ROS on WSJ
@@ 第56行： / 第56行： @@
 ==== low resource language AM training ====
-* Results on CVSS[http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?step=view_request&cvssid=274]
 * Use Chinese NN as initial NN, change the last layer
+:* Various the used Chinese trained DNN layer numbers.
 ====Scoring====
 * global scoring done.
 * Pitch & rhythm done, need testing
@@ 第67行： / 第66行： @@
 ====Confidence====
+* Reproduce the experiments on fisher dataset.
+* Use the fisher DNN model to decode all-wsj dataset
-* experiments done, need more data
-*
-* Basic confidence by using lattice-based posterior + DNN posterior + ROS done
-* 23% detection error achieved by balanced model
 ===Speaker ID===
+* Preparing GMM-based server.
-* Add VAD to system
+===Emotion detection===
-* GMM-based test program delivered
-* GMM registration program done
-===Emotion detection===
-* Zhang Weiwei is learning the code
 * Sinovoice is implementing the server

“2014-10-13”版本间的差异

2014年10月20日 (一) 06:31的最后版本

目录

Speech Processing

AM development

Sparse DNN

RNN AM

Noise training

Drop out & Rectification & convolutive network

Denoising & Farfield ASR

VAD

Speech rate training

low resource language AM training

Scoring

Confidence

Speaker ID

Emotion detection

Text Processing

LM development

Domain specific LM

Word2Vector

W2V based doc classification

RNN LM

Translation

QA

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具