2014-09-05

来自cslt Wiki

2014年9月5日 (五) 01:43Cslt（讨论 | 贡献）的版本

(差异) ←上一版本 | 最后版本 (差异) | 下一版本→ (差异)

跳转至：导航、搜索

Resoruce Building

Leftover questions

Investigating LOUDS FST.
CLG embedded decoder plus online compiler.
DNN-GMM co-training
NN LM

AM development

Sparse DNN

Investigating layer-based DNN training

Noise training

Noisy training journal paper almost done.

Drop out & Rectification & convolutive network

Drop out

No performance improvement found yet.
[1]

Rectification

Dropout NA problem was caused by large magnitude of weights

Convolutive network

Test more configurations

Denoising & Farfield ASR

Lasso-based dereverberation obtained reasonable results

optimize the training parameters by the development set
Found similar alpha for both near and far recordings. Need more investigation.

VAD

Noise model training stuck by local minimal.
Some discrepancy between CSLT results & Puqiang results

check if the label is really problematic
check if short-time spike noise is the major problem (can be solved by spike filtering)
check if low-energy babble noise caused mismatch (can be solved by global energy detection)

Speech rate training

Some interesting results with the simple speech rate change algorithm was obtained on the WSJ db

Seems ROS model is superior to the normal one with faster speech
Need to check distribution of ROS on WSJ
Suggest to extract speech data of different ROS, construct a new test set
Suggest to use Tencent training data
Suggest to remove silence when compute ROS

Scoring

hold

Confidence

Implement a tool for data labeling, correcting some errors.
Finished extraction of two features: DNN posterior + lattice posterior

LM development

Domain specific LM

h2. G determinization problem solved.

h2. NUM tag LM:

Seems OK with the tag LM.

Word2Vector

W2V based doc classification

Initial results variable Bayesian GMM obtained. Performance is not as good as the conventional GMM.
Interest group setup, reading scheduled every Thusday
Non-linear inter-language transform: English-Spanish-Czch: wv model training done, transform model on investigation

Investigate more iterations to obtain a better more
Checking the discrepancy between the matlab nnet tool & sklearn.

RNN LM

Prepare WSJ database
Trained model 10000 x 4 + 320 + 10000
Start to test on n-best rescore

Speaker ID

Second model done

Emotion detection

delivered to Sinovoice

Translation

v2.0 demo ready

QA

Labeled 1000 utterances as the evaluation
35% 11-class accuracy
EA not done yet

取自“http://cslt.org/mediawiki/index.php?title=2014-09-05&oldid=11157”