2014年9月5日 (五) 01:43 Cslt

2014-09-05T01:43:19Z

Cslt：以“==Resoruce Building== == Leftover questions== * Investigating LOUDS FST. * CLG embedded decoder plus online compiler. * DNN-GMM co-training * NN LM == AM develop...”为内容创建页面

2014-09-05T01:34:13Z

以“==Resoruce Building== == Leftover questions== * Investigating LOUDS FST. * CLG embedded decoder plus online compiler. * DNN-GMM co-training * NN LM == AM develop...”为内容创建页面

新页面

==Resoruce Building==

== Leftover questions==

* Investigating LOUDS FST.
* CLG embedded decoder plus online compiler.
* DNN-GMM co-training
* NN LM

== AM development ==

=== Sparse DNN ===
* Investigating layer-based DNN training

===Noise training===
:* Noisy training journal paper almost done.

==Drop out & Rectification & convolutive network==

* Drop out

:* No performance improvement found yet.
:* [http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?account=wangd&step=view_request&cvssid=261]

* Rectification
:* Dropout NA problem was caused by large magnitude of weights

* Convolutive network
# Test more configurations

===Denoising & Farfield ASR===

* Lasso-based dereverberation obtained reasonable results
:* optimize the training parameters by the development set
:* Found similar alpha for both near and far recordings. Need more investigation.

===VAD===

* Noise model training stuck by local minimal.
* Some discrepancy between CSLT results & Puqiang results
[http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?account=wangd&step=view_request&cvssid=207]
:* check if the label is really problematic
:* check if short-time spike noise is the major problem (can be solved by spike filtering)
:* check if low-energy babble noise caused mismatch (can be solved by global energy detection)

===Speech rate training===

* Some interesting results with the simple speech rate change algorithm was obtained on the WSJ db
[http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?account=wangd&step=view_request&cvssid=268]

* Seems ROS model is superior to the normal one with faster speech
* Need to check distribution of ROS on WSJ
* Suggest to extract speech data of different ROS, construct a new test set
* Suggest to use Tencent training data
* Suggest to remove silence when compute ROS

===Scoring===

* hold

===Confidence===

* Implement a tool for data labeling, correcting some errors.
* Finished extraction of two features: DNN posterior + lattice posterior

==LM development==

===Domain specific LM===

h2. G determinization problem solved.

h2. NUM tag LM:

* Seems OK with the tag LM.
[http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?account=wangd&step=view_request&cvssid=272]

==Word2Vector==

===W2V based doc classification===

* Initial results variable Bayesian GMM obtained. Performance is not as good as the conventional GMM.
* Interest group setup, reading scheduled every Thusday
* Non-linear inter-language transform: English-Spanish-Czch: wv model training done, transform model on investigation
:* Investigate more iterations to obtain a better more
:* Checking the discrepancy between the matlab nnet tool & sklearn.

==RNN LM==

* Prepare WSJ database
* Trained model 10000 x 4 + 320 + 10000
* Start to test on n-best rescore

==Speaker ID==

* Second model done

==Emotion detection==

* delivered to Sinovoice

==Translation==

* v2.0 demo ready

←上一版本		2014年9月5日 (五) 01:43的版本
第106行：		第106行：

	* v2.0 demo ready		* v2.0 demo ready
		+
		+	==QA==
		+
		+	* Labeled 1000 utterances as the evaluation
		+	* 35% 11-class accuracy
		+	* EA not done yet

2014-09-05 - 版本历史

2014年9月5日 (五) 01:43 Cslt

Cslt：以“==Resoruce Building== == Leftover questions== * Investigating LOUDS FST. * CLG embedded decoder plus online compiler. * DNN-GMM co-training * NN LM == AM develop...”为内容创建页面