“ASR:2015-04-20”版本间的差异

2015年4月22日 (三) 08:49的最后版本

Speech Processing

AM development

Environment

grid-11 often shut down automatically, too slow computation speed.
New grid-13 added, using gpu970
To update the wiki enviroment infomation

RNN AM

details at http://liuc.cslt.org/pages/rnnam.html
Test monophone on RNN using dark-knowledge
run using wsj,MPE

Mic-Array

Change the prediction from fbank to spectrum features
investigate alpha parameter in time domian and frquency domain
ALPHA>=0, using data generated by reverber toolkit
consider theta

RNN-DAE(Deep based Auto-Encode-RNN)

HOLD --Zhiyong Zhang
http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?account=zhangzy&step=view_request&cvssid=261

Speaker ID

DNN-based sid --Yiye Lin
http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?account=zhangzy&step=view_request&cvssid=327

Ivector&Dvector based ASR

Cluster the speakers to speaker-classes, then using the distance or the posterior-probability as the metric
Direct using the dark-knowledge strategy to do the ivector training.
http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?step=view_request&cvssid=340
Ivector dimention is smaller, performance is better
Augument to hidden layer is better than input layer
train on wsj(testbase dev93+evl92)

Dark knowledge

Ensemble

http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?account=zxw&step=view_request&cvssid=264 --Zhiyong Zhang

adaptation for chinglish under investigation --Mengyuan Zhao

Try to improve the chinglish performance extremly

unsupervised training with wsj contributes to aurora4 model --Xiangyu Zeng

test large database with AMIDA

bilingual recognition

http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?account=zxw&step=view_request&cvssid=359 --Zhiyuan Tang

Text Processing

tag LM

similar word extension in FST

will check the formula using Bayes and experiment
add similarity weight

RNN LM

rnn

test the ppl and code the character-lm

lstm+rnn

check the lstm-rnnlm code about how to Initialize and update learning rate.(hold)

W2V based document classification

result about norm model [1]
try CNN model

Translation

v5.0 demo released

cut the dict and use new segment-tool

Sparse NN in NLP

test the drop-out model and the performance gets a little improvement, need some result:
test the order feature ,need some result:
large dimension result:http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?account=lr&step=view_request&cvssid=344

sparse-nn on 1000 dimension(le-6,0.705236) is better than 200 dimension(le-12,0.694678).

online learning

modified the listNet SGD

relation classifier

check the CNN code and contact the author of paper

@@ 第4行： / 第4行： @@
 ==== Environment ====
 * grid-11 often shut down automatically, too slow computation speed.
-* add a server(760)
+* New grid-13 added, using gpu970
+* To update the wiki enviroment infomation
 ==== RNN AM====
 * details at http://liuc.cslt.org/pages/rnnam.html
-* tuning parameters on monophone NN
+* Test monophone on RNN using dark-knowledge
 * run using wsj,MPE
 ==== Mic-Array ====
+* Change the prediction from  fbank to spectrum features
 * investigate alpha parameter in time domian and frquency domain
 * ALPHA>=0, using data generated by reverber toolkit
 * consider theta
-====Convolutive network====
-* HOLD
-* CNN + DNN feature fusion
 ====RNN-DAE(Deep based Auto-Encode-RNN)====
-* HOLD -Zhiyong
+* HOLD --Zhiyong Zhang
 * http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?account=zhangzy&step=view_request&cvssid=261
 ===Speaker ID===
-:* DNN-based sid --Yiye
+:* DNN-based sid --Yiye Lin
 :* http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?account=zhangzy&step=view_request&cvssid=327
-===Ivector based ASR===
+===Ivector&Dvector based ASR===
-*hold
+:* Cluster the speakers to speaker-classes, then using the distance or the posterior-probability as the metric
+:* Direct using the dark-knowledge strategy to do the ivector training.
 :* http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?step=view_request&cvssid=340
 :* Ivector dimention is smaller, performance is better
@@ 第39行： / 第35行： @@
 ===Dark knowledge===
-:*http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?account=zxw&step=view_request&cvssid=264 --zhiyong
+:* Ensemble
-:* trial on logit matching faild --mengyuan
+::*http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?account=zxw&step=view_request&cvssid=264 --Zhiyong Zhang
-:* adaptation for chinglish under investigation-mengyuan
+:* adaptation for chinglish under investigation  --Mengyuan Zhao
-:* unsupervised training with wsj contributes to aurora4 model--xiangyu
+::* Try to improve the chinglish performance extremly
-:* test large database with amida--xiangyu
+:* unsupervised training with wsj contributes to aurora4 model --Xiangyu Zeng
+::* test large database with AMIDA
 ===bilingual recognition===
-:* http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?account=zxw&step=view_request&cvssid=359--zhiyuan
+:* http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?account=zxw&step=view_request&cvssid=359 --Zhiyuan Tang
 ==Text Processing==
@@ 第72行： / 第69行： @@
 * test the order feature ,need some result:
 * large dimension result:http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?account=lr&step=view_request&cvssid=344
-:* sparse-nn on 1000 dimension(le-6,0.71) is better than 200 dimension(le-12,0.69).
+:* sparse-nn on 1000 dimension(le-6,0.705236) is better than 200 dimension(le-12,0.694678).
 ===online learning===

“ASR:2015-04-20”版本间的差异

2015年4月22日 (三) 08:49的最后版本

目录

Speech Processing

AM development

Environment

RNN AM

Mic-Array

RNN-DAE(Deep based Auto-Encode-RNN)

Speaker ID

Ivector&Dvector based ASR

Dark knowledge

bilingual recognition

Text Processing

tag LM

RNN LM

W2V based document classification

Translation

Sparse NN in NLP

online learning

relation classifier

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具