2014-08-29

来自cslt Wiki
2014年8月29日 (五) 02:14Cslt讨论 | 贡献的版本

(差异) ←上一版本 | 最后版本 (差异) | 下一版本→ (差异)
跳转至: 导航搜索

Resoruce Building

Leftover questions

  • Investigating LOUDS FST.
  • CLG embedded decoder plus online compiler.
  • DNN-GMM co-training
  • NN LM

AM development

Sparse DNN

  • WJS sparse DNN does not obtain further improvement

Noise training

  • Error found in data setting. Re-run the test with gamma=20,30
  • Re-run test with gamma=1,0.1
  • Noisy training journal paper almost done.

Drop out & Rectification & convolutive network

  • Change learning to 0.001, the training process can be started:
    1. change the drop probability from 0.5 to 0.2. Frame accuracy is improved. WER seems problematic.
    2. Experiment learning rate 1 and 8, NA
  • Rectification
  1. Rectification itself failed with large weights.
  2. Including L1 penalty enables the training but got very bad performance.
  3. Try to set the maximum value with rectifier
  • Convolutive network
  1. Test more configurations

Denoising & Farfield ASR

  • Lasso-based dereverberation obtained reasonable results
  1. Found some specious problems with frequency-dependent Lasso.
  2. Proposed full frequency Lasso & full frequency-temporal Lasso.
  3. good performance was obtained with F-dependent Lasso
  • Near data: 10.79 -> 10.35 (lamdba=0.05)
  • Far data : 40.53 -> 35.65 (lambda=0.15)

VAD

  • Some discrepancy between CSLT results & Puqiang results

[1]

  • check if the label is really problematic
  • check if short-time spike noise is the major problem (can be solved by spike filtering)
  • check if low-energy babble noise caused mismatch (can be solved by global energy detection)
  • test noise data trained model

Speech rate training

  • Some interesting results with the simple speech rate change algorithm was obtained on the WSJ db

[2]

  • Seems ROS model is superior to the normal one with faster speech
  • Need to check distribution of ROS on WSJ
  • Suggest to extract speech data of different ROS, construct a new test set
  • Suggest to use Tencent training data
  • Suggest to remove silence when compute ROS


Scoring

  • hold

Confidence

  • Implement a tool for data labeling
  • Finished extraction of two features: DNN posterior + lattice posterior


LM development

Domain specific LM

h2. G determinization problem re-open.

h2. NUM tag LM:

27h JS test: 20.16 vs 20.19 2h JS test: 17.48 vs 17.49

h2. Analyze the property of the tag LM: (1) random NUM should obtain better performance; (2) other words are not seriously impacted.


Word2Vector

W2V based doc classification

  • Initial results variable Bayesian GMM obtained. Performance is not as good as the conventional GMM.
  • Interest group setup, reading scheduled every Thusday
  • Non-linear inter-language transform: English-Spanish-Czch: wv model training done, transform model on investigation


RNN LM

  • New toolkit from Thomas obtained.
  • Prepare WSJ database, re-test RNN.


Speaker ID

  • Second model done

Emotion detection

  • initial performance obtained

[3]


Translation

  • Failed due to out of memory
  • Re-start the training due to some errors in grid