“ASR:2015-05-18”版本间的差异

2015年5月20日 (三) 08:58的版本

Speech Processing

AM development

Environment

grid-15 often does not work
grid-14 often does not work

RNN AM

details at http://liuc.cslt.org/pages/rnnam.html
Test monophone on RNN using dark-knowledge --Chao Liu
run using wsj,MPE --Chao Liu
run bi-directon --Chao Liu
train RNN with dark knowledge transfer on AURORA4 --zhiyuan

http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?account=zxw&step=view_request&cvssid=383--zhiyuan

Mic-Array

hold
Change the prediction from fbank to spectrum features
investigate alpha parameter in time domian and frquency domain
ALPHA>=0, using data generated by reverber toolkit
consider theta
compute EER with kaldi

RNN-DAE(Deep based Auto-Encode-RNN)

deliver to mengyuan

http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?account=zhangzy&step=view_request&cvssid=261

Speaker ID

DNN-based sid --Yiye Lin

http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?account=zhangzy&step=view_request&cvssid=327

Ivector&Dvector based ASR

hold --Tian Lan
Cluster the speakers to speaker-classes, then using the distance or the posterior-probability as the metric
Direct using the dark-knowledge strategy to do the ivector training.

http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?step=view_request&cvssid=340

Ivector dimention is smaller, performance is better
Augument to hidden layer is better than input layer
train on wsj(testbase dev93+evl92)

Dark knowledge

Ensemble using 100h dataset to construct diffrernt structures -- Mengyuan

http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?account=zxw&step=view_request&cvssid=264 --Zhiyong Zhang

adaptation English and Chinglish

Try to improve the chinglish performance extremly

unsupervised training with wsj contributes to aurora4 model --Xiangyu Zeng
test large database with AMIDA
test hidden layer knowledge transfer--xuewei

bilingual recognition

hold

http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?account=zxw&step=view_request&cvssid=359 --Zhiyuan Tang and Mengyuan

language vector

train DNN with language vector--xuewei

Text Processing

RNN LM

character-lm rnn(hold)
lstm+rnn

check the lstm-rnnlm code about how to Initialize and update learning rate.(hold)

W2V based document classification

make a technical report about document classification using CNN --yiqiao
CNN adapt to resolve the low resource problem

Translation

similar-pair method in English word using translation model.

result:wer:70%-50% on top1.
change the AM model

Order representation

modify the objective function
sup-sampling method to solve the low frequence word

binary vector

Stochastic ListNet

using sampling method and test

relation classifier

test the bidirectional neural network(B-RNN) and get a little improvement

plan to do

combine LDA with neural network

@@ 第4行： / 第4行： @@
 ==== Environment ====
 * grid-15 often does not work
+* grid-14 often does not work
 ==== RNN AM====
@@ 第10行： / 第11行： @@
 * run using wsj,MPE  --Chao Liu
 * run bi-directon --Chao Liu
-* modify code --Zhiyuan
+* train RNN with dark knowledge transfer on AURORA4 --zhiyuan
+:*http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?account=zxw&step=view_request&cvssid=383--zhiyuan
 ==== Mic-Array ====
+* hold
 * Change the prediction from  fbank to spectrum features
 * investigate alpha parameter in time domian and frquency domain
@@ 第20行： / 第23行： @@
 ====RNN-DAE(Deep based Auto-Encode-RNN)====
-* HOLD --Zhiyong Zhang
+* deliver to mengyuan
-* http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?account=zhangzy&step=view_request&cvssid=261
+:* http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?account=zhangzy&step=view_request&cvssid=261
 ===Speaker ID===
-:* DNN-based sid --Yiye Lin
+* DNN-based sid --Yiye Lin
 :* http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?account=zhangzy&step=view_request&cvssid=327
 ===Ivector&Dvector based ASR===
-*  hold     --Tian Lan
+*  hold --Tian Lan
-:* Cluster the speakers to speaker-classes, then using the distance or the posterior-probability as the metric
+* Cluster the speakers to speaker-classes, then using the distance or the posterior-probability as the metric
-:* Direct using the dark-knowledge strategy to do the ivector training.
+* Direct using the dark-knowledge strategy to do the ivector training.
 :* http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?step=view_request&cvssid=340
-:* Ivector dimention is smaller, performance is better
+* Ivector dimention is smaller, performance is better
-:* Augument to hidden layer is better than input layer
+* Augument to hidden layer is better than input layer
-:* train on wsj(testbase dev93+evl92)
+* train on wsj(testbase dev93+evl92)
 ===Dark knowledge===
-:* Ensemble using 100h dataset to construct diffrernt structures -- Mengyuan
+* Ensemble using 100h dataset to construct diffrernt structures -- Mengyuan
-::*http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?account=zxw&step=view_request&cvssid=264 --Zhiyong Zhang
+:*http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?account=zxw&step=view_request&cvssid=264 --Zhiyong Zhang
-:* adaptation for chinglish under investigation  --Mengyuan Zhao
+* adaptation English and Chinglish
-::* Try to improve the chinglish performance extremly
+:* Try to improve the chinglish performance extremly
-:* unsupervised training with wsj contributes to aurora4 model --Xiangyu Zeng
+* unsupervised training with wsj contributes to aurora4 model --Xiangyu Zeng
-::* test large database with AMIDA
+* test large database with AMIDA
+* test hidden layer knowledge transfer--xuewei
 ===bilingual recognition===
+* hold
 :* http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?account=zxw&step=view_request&cvssid=359 --Zhiyuan Tang and Mengyuan
+===language vector===
+* train DNN with language vector--xuewei
 ==Text Processing==

“ASR:2015-05-18”版本间的差异

2015年5月20日 (三) 08:58的版本

目录

Speech Processing

AM development

Environment

RNN AM

Mic-Array

RNN-DAE(Deep based Auto-Encode-RNN)

Speaker ID

Ivector&Dvector based ASR

Dark knowledge

bilingual recognition

language vector

Text Processing

RNN LM

W2V based document classification

Translation

Order representation

binary vector

Stochastic ListNet

relation classifier

plan to do

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具