Cslt：以内容“==Resoruce Building== == Leftover questions== * Investigating LOUDS FST. * CLG embedded decoder plus online compiler. * DNN-GMM co-training * NN LM == AM developmen...”创建新页面

2014-08-22T02:13:24Z

以内容“==Resoruce Building== == Leftover questions== * Investigating LOUDS FST. * CLG embedded decoder plus online compiler. * DNN-GMM co-training * NN LM == AM developmen...”创建新页面

新页面

==Resoruce Building==

== Leftover questions==

* Investigating LOUDS FST.
* CLG embedded decoder plus online compiler.
* DNN-GMM co-training
* NN LM

== AM development ==

=== Sparse DNN ===
* WJS sparse DNN does not obtain further improvement

===Noise training===

:* Noisy training journal paper almost done.

==Drop out & Rectification & convolutive network==

* Change learning to 0.001, the training process can be started:
*# check the drop probability
*# check learning rate
*# continuous training

* Rectification
# Rectification itself failed with large weights.
# Including L1 penalty enables the training but got very bad performance.
# Try to set the maximum value with rectifier

* Convolutive network
# Test more configurations

===Denoising & Farfield ASR===

* Lasso-based dereverberation obtained reasonable results
:# spectrum based lasso outperforms fbank based lasso.
:# temporal-frequency based lasso outperforms just temporal based lasso.
:# using 200 frame to estimate utterance-based lasso coefficients is possible, with marginal performance degradation.
:# using lasso can solve the problem of dynamic reverberation.
:# Need to investigate static reverberation.
:# The 1/3 paper has been checked in to cvs.

===VAD===

* Found some problems in Puqiang's speech data. Some files are labelled incorrectly.

===Speech rate training===

* Append an additional dimension to the feature vector, indicating the rate of speech
* The ROS is computed as words per second

===Scoring===

* Refine the acoustic model with AMIDA database. problem solved by involving both wsj and AMIDA.

===Confidence===

* Knowledge prepared
* First experiment with combining lattice-based confidence and DNN confidence.
* Further step will add ROS.

===Embedded decoder===

* Chatting LM released (80k)
* Train two smaller network: 500x4+600, 400x4+500: on going
* Build a new graph with MPE3 am and chatting LM.

==LM development==

===Domain specific LM===

h2. G determinization problem solved

h2. NUM tag LM:

27h JS test: 20.16 vs 20.19
2h JS test: 17.48 vs 17.49

h2. Analyze the property of the tag LM: (1) random NUM should obtain better perfomance; (2) other words are not seriously impacted.

==Word2Vector==

===W2V based doc classification===

* Initial results variable Bayesian GMM obtained. Performance is not as good as the conventional GMM.
* Interest group setup, reading scheduled every Thusday
* Non-linear inter-language transform: English-Spanish-Czch: wv model training done, transform model on investigation

==RNN LM==

* New toolkit from Thomas obtained
* Need more investigation on the toolkit

==Speaker ID==

* Second model done

==Translation==
* Failed due to out of memory
* Re-train the model with limitation on iteration number. Goes to 8th iteration

2014-08-22 - 版本历史

Cslt：以内容“==Resoruce Building== == Leftover questions== * Investigating LOUDS FST. * CLG embedded decoder plus online compiler. * DNN-GMM co-training * NN LM == AM developmen...”创建新页面