“2014-09-29”版本间的差异

2014年9月29日 (一) 05:49的最后版本

Speech Processing

AM development

Sparse DNN

Performance improvement found when pruned slightly
Experiments show that
Suggest to use TIMIT / AURORA 4 for training

Noise training

First draft of the noisy training journal paper
Check abnormal behavior with large sigma (Yinshi, Liuchao)

Drop out & Rectification & convolutive network

Drop out

No performance improvement found yet.
[1]

Rectification

Dropout NA problem was caused by large magnitude of weights

Convolutive network

Test more configurations

Zhiyong will work on CNN

Recurrent neural network

investigate CURRENNT for AM

Denoising & Farfield ASR

Lasso-based de-reverberation is done with the REVERBERATION toolkit

Start to compose the experiment section for the SL paper.

VAD

problems found at the beginning part of speech (0-0.02s?)
Noise model training done. Under testing.
Need to investigate the performance reduction in babble noise. Call Jia.

Speech rate training

Some interesting results with the simple speech rate change algorithm was obtained on the WSJ db

[2]

Seems ROS model is superior to the normal one with faster speech
Need to check distribution of ROS on WSJ
Suggest to extract speech data of different ROS, construct a new test set
Suggest to use Tencent training data
Suggest to remove silence when compute ROS

low resource language AM training

Results on CVSS[3]
Use Chinese NN as initial NN, change the last layer

Scoring

global scoring done.
Pitch & rhythm done, need testing
Harmonics hold

Confidence

experiments done, need more data
Basic confidence by using lattice-based posterior + DNN posterior + ROS done
23% detection error achieved by balanced model

Speaker ID

Add VAD to system
GMM-based test program delivered
GMM registration program done

Emotion detection

Zhang Weiwei is learning the code
Sinovoice is implementing the server

Text Processing

LM development

Domain specific LM

h2. ngram generation is on going h2. look the memory and baidu_hi done

h2. NUM tag LM:

maxi work is released.
yuanbin continue the tag lm work.
add the ner to tag lm .
Boost specific words like wifi if TAG model does not work for a particular word.

Word2Vector

W2V based doc classification

Initial results variable Bayesian GMM obtained. Performance is not as good as the conventional GMM.
Non-linear inter-language transform: English-Spanish-Czch: wv model training done, transform model on investigation

SSA-based local linear mapping still on running.
k-means classes change to 2.

Knowledge vector started

document obtained from wiki
formula obtained

Character to word conversion

read more paper .
prepare to train .

Google word vector train

improve the sampling method

RNN LM

Prepare WSJ database
Trained model 10000 x 4 + 320 + 10000
Better performance obtained (4.16-3.47)
gigaword sampling for Chinese data

Translation

v3.0 demo released

still slow
cut the vocabulary that is not important .

QA

liangshan_v1 performance 74.3%.
New framework and GA method is done
add SEMPRE tool to framework

@@ 第3行： / 第3行： @@
 ==== Sparse DNN ====
-* Investigating layer-based DNN training
+* Performance improvement found when pruned slightly
+* Experiments show that
+* Suggest to use TIMIT / AURORA 4 for training
 ====Noise training====
+*
 :* First draft of the noisy training journal paper
 :* Check abnormal behavior with large sigma (Yinshi, Liuchao)
@@ 第21行： / 第24行： @@
 * Convolutive network
 # Test more configurations
+* Zhiyong will work on CNN
+* Recurrent neural network
+:* investigate CURRENNT for AM
@@ 第26行： / 第33行： @@
 ====Denoising & Farfield ASR====
+*
 * Lasso-based de-reverberation is done with the REVERBERATION toolkit
 :* Start to compose the experiment section for the SL paper.
@@ 第31行： / 第39行： @@
 ====VAD====
+* problems found at the beginning part of speech (0-0.02s?)
 * Noise model training done. Under testing.
 * Need to investigate the performance reduction in babble noise. Call Jia.
@@ 第36行： / 第45行： @@
 ====Speech rate training====
+*
 * Some interesting results with the simple speech rate change algorithm was obtained on the WSJ db
 [http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?account=wangd&step=view_request&cvssid=268]
@@ 第45行： / 第54行： @@
 * Suggest to use Tencent training data
 * Suggest to remove silence when compute ROS
+==== low resource language AM training ====
+* Results on CVSS[http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?step=view_request&cvssid=274]
+* Use Chinese NN as initial NN, change the last layer
 ====Scoring====
-* Pitch & rythmn done.
+* global scoring done.
+* Pitch & rhythm done, need testing
 * Harmonics hold
@@ 第54行： / 第68行： @@
 ====Confidence====
+* experiments done, need more data
+*
 * Basic confidence by using lattice-based posterior + DNN posterior + ROS done
 * 23% detection error achieved by balanced model
@@ 第59行： / 第75行： @@
 ===Speaker ID===
+* Add VAD to system
 * GMM-based test program delivered
-* Implementing GMM registration program
+* GMM registration program done
 ===Emotion detection===
+* Zhang Weiwei is learning the code
 * Sinovoice is implementing the server

“2014-09-29”版本间的差异

2014年9月29日 (一) 05:49的最后版本

目录

Speech Processing

AM development

Sparse DNN

Noise training

Drop out & Rectification & convolutive network

Denoising & Farfield ASR

VAD

Speech rate training

low resource language AM training

Scoring

Confidence

Speaker ID

Emotion detection

Text Processing

LM development

Domain specific LM

Word2Vector

W2V based doc classification

RNN LM

Translation

QA

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具