2014-09-22
来自cslt Wiki
目录
Resoruce Building
Leftover questions
- Investigating LOUDS FST.
- CLG embedded decoder plus online compiler.
- DNN-GMM co-training
- NN LM
AM development
Sparse DNN
- Investigating layer-based DNN training
Noise training
- First draft of the noisy training journal paper
- Check abnormal behavior with large sigma (Yinshi, Liuchao)
Drop out & Rectification & convolutive network
- Drop out
- No performance improvement found yet.
- [1]
- Rectification
- Dropout NA problem was caused by large magnitude of weights
- Convolutive network
- Test more configurations
Denoising & Farfield ASR
- Lasso-based de-reverberation is done with the REVERBERATION toolkit
- Start to compose the experiment section for the SL paper.
VAD
- Noise model training done. Under testing.
- Need to investigate the performance reduction in babble noise. Call Jia.
Speech rate training
- Some interesting results with the simple speech rate change algorithm was obtained on the WSJ db
- Seems ROS model is superior to the normal one with faster speech
- Need to check distribution of ROS on WSJ
- Suggest to extract speech data of different ROS, construct a new test set
- Suggest to use Tencent training data
- Suggest to remove silence when compute ROS
Scoring
- Pitch & rythmn done.
- Harmonics hold
Confidence
- Basic confidence by using lattice-based posterior + DNN posterior + ROS done
- 23% detection error achieved by balanced model
LM development
Domain specific LM
h2. domain specific count dumped h2. ngram generation is on going
h2. NUM tag LM:
- HCLG union seems better than G union, when integrating grammar + LM (25->23)
- Boost specific words like wifi if TAG model does not work for a particular word.
Word2Vector
W2V based doc classification
- Initial results variable Bayesian GMM obtained. Performance is not as good as the conventional GMM.
- Non-linear inter-language transform: English-Spanish-Czch: wv model training done, transform model on investigation
- probably over-fitting with the MLP training
- SSA-based local linear mapping still on running
- Knowledge vector started
- document obtained from wiki
- Character to word conversion
- Design the transform model
RNN LM
- Prepare WSJ database
- Trained model 10000 x 4 + 320 + 10000
- Better performance obtained (4.16-3.47)
- gigaword sampling for Chinese data
Speaker ID
- Second model done
Emotion detection
- delivered to Sinovoice
Translation
- v3.0 demo released
QA
- Framework done