“2014-09-29”版本间的差异
来自cslt Wiki
(某位用户的一个中间修订版本未显示) | |||
第3行: | 第3行: | ||
==== Sparse DNN ==== | ==== Sparse DNN ==== | ||
− | * | + | * Performance improvement found when pruned slightly |
+ | * Experiments show that | ||
+ | * Suggest to use TIMIT / AURORA 4 for training | ||
====Noise training==== | ====Noise training==== | ||
+ | * | ||
:* First draft of the noisy training journal paper | :* First draft of the noisy training journal paper | ||
:* Check abnormal behavior with large sigma (Yinshi, Liuchao) | :* Check abnormal behavior with large sigma (Yinshi, Liuchao) | ||
第21行: | 第24行: | ||
* Convolutive network | * Convolutive network | ||
# Test more configurations | # Test more configurations | ||
+ | * Zhiyong will work on CNN | ||
+ | |||
+ | * Recurrent neural network | ||
+ | :* investigate CURRENNT for AM | ||
第26行: | 第33行: | ||
====Denoising & Farfield ASR==== | ====Denoising & Farfield ASR==== | ||
+ | * | ||
* Lasso-based de-reverberation is done with the REVERBERATION toolkit | * Lasso-based de-reverberation is done with the REVERBERATION toolkit | ||
:* Start to compose the experiment section for the SL paper. | :* Start to compose the experiment section for the SL paper. | ||
第31行: | 第39行: | ||
====VAD==== | ====VAD==== | ||
+ | * problems found at the beginning part of speech (0-0.02s?) | ||
* Noise model training done. Under testing. | * Noise model training done. Under testing. | ||
* Need to investigate the performance reduction in babble noise. Call Jia. | * Need to investigate the performance reduction in babble noise. Call Jia. | ||
第36行: | 第45行: | ||
====Speech rate training==== | ====Speech rate training==== | ||
− | + | * | |
* Some interesting results with the simple speech rate change algorithm was obtained on the WSJ db | * Some interesting results with the simple speech rate change algorithm was obtained on the WSJ db | ||
[http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?account=wangd&step=view_request&cvssid=268] | [http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?account=wangd&step=view_request&cvssid=268] | ||
第45行: | 第54行: | ||
* Suggest to use Tencent training data | * Suggest to use Tencent training data | ||
* Suggest to remove silence when compute ROS | * Suggest to remove silence when compute ROS | ||
+ | |||
+ | ==== low resource language AM training ==== | ||
+ | * Results on CVSS[http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?step=view_request&cvssid=274] | ||
+ | * Use Chinese NN as initial NN, change the last layer | ||
====Scoring==== | ====Scoring==== | ||
− | * Pitch & | + | * global scoring done. |
+ | * Pitch & rhythm done, need testing | ||
* Harmonics hold | * Harmonics hold | ||
第54行: | 第68行: | ||
====Confidence==== | ====Confidence==== | ||
+ | * experiments done, need more data | ||
+ | * | ||
* Basic confidence by using lattice-based posterior + DNN posterior + ROS done | * Basic confidence by using lattice-based posterior + DNN posterior + ROS done | ||
* 23% detection error achieved by balanced model | * 23% detection error achieved by balanced model | ||
第59行: | 第75行: | ||
===Speaker ID=== | ===Speaker ID=== | ||
+ | * Add VAD to system | ||
* GMM-based test program delivered | * GMM-based test program delivered | ||
− | * | + | * GMM registration program done |
===Emotion detection=== | ===Emotion detection=== | ||
+ | * Zhang Weiwei is learning the code | ||
* Sinovoice is implementing the server | * Sinovoice is implementing the server | ||
第72行: | 第90行: | ||
====Domain specific LM==== | ====Domain specific LM==== | ||
− | |||
h2. ngram generation is on going | h2. ngram generation is on going | ||
− | + | h2. look the memory and baidu_hi done | |
h2. NUM tag LM: | h2. NUM tag LM: | ||
− | + | * maxi work is released. | |
− | * | + | * yuanbin continue the tag lm work. |
+ | * add the ner to tag lm . | ||
* Boost specific words like wifi if TAG model does not work for a particular word. | * Boost specific words like wifi if TAG model does not work for a particular word. | ||
第88行: | 第106行: | ||
* Initial results variable Bayesian GMM obtained. Performance is not as good as the conventional GMM. | * Initial results variable Bayesian GMM obtained. Performance is not as good as the conventional GMM. | ||
* Non-linear inter-language transform: English-Spanish-Czch: wv model training done, transform model on investigation | * Non-linear inter-language transform: English-Spanish-Czch: wv model training done, transform model on investigation | ||
− | + | :* SSA-based local linear mapping still on running. | |
− | :* SSA-based local linear mapping still on running | + | :* k-means classes change to 2. |
* Knowledge vector started | * Knowledge vector started | ||
:* document obtained from wiki | :* document obtained from wiki | ||
+ | :* formula obtained | ||
* Character to word conversion | * Character to word conversion | ||
− | :* | + | :* read more paper . |
+ | :* prepare to train . | ||
+ | * Google word vector train | ||
+ | :* improve the sampling method | ||
===RNN LM=== | ===RNN LM=== | ||
第109行: | 第131行: | ||
* v3.0 demo released | * v3.0 demo released | ||
:* still slow | :* still slow | ||
+ | :* cut the vocabulary that is not important . | ||
===QA=== | ===QA=== | ||
− | * | + | * liangshan_v1 performance 74.3%. |
− | * | + | * New framework and GA method is done |
+ | * add SEMPRE tool to framework |
2014年9月29日 (一) 05:49的最后版本
Speech Processing
AM development
Sparse DNN
- Performance improvement found when pruned slightly
- Experiments show that
- Suggest to use TIMIT / AURORA 4 for training
Noise training
- First draft of the noisy training journal paper
- Check abnormal behavior with large sigma (Yinshi, Liuchao)
Drop out & Rectification & convolutive network
- Drop out
- No performance improvement found yet.
- [1]
- Rectification
- Dropout NA problem was caused by large magnitude of weights
- Convolutive network
- Test more configurations
- Zhiyong will work on CNN
- Recurrent neural network
- investigate CURRENNT for AM
Denoising & Farfield ASR
- Lasso-based de-reverberation is done with the REVERBERATION toolkit
- Start to compose the experiment section for the SL paper.
VAD
- problems found at the beginning part of speech (0-0.02s?)
- Noise model training done. Under testing.
- Need to investigate the performance reduction in babble noise. Call Jia.
Speech rate training
- Some interesting results with the simple speech rate change algorithm was obtained on the WSJ db
- Seems ROS model is superior to the normal one with faster speech
- Need to check distribution of ROS on WSJ
- Suggest to extract speech data of different ROS, construct a new test set
- Suggest to use Tencent training data
- Suggest to remove silence when compute ROS
low resource language AM training
- Results on CVSS[3]
- Use Chinese NN as initial NN, change the last layer
Scoring
- global scoring done.
- Pitch & rhythm done, need testing
- Harmonics hold
Confidence
- experiments done, need more data
- Basic confidence by using lattice-based posterior + DNN posterior + ROS done
- 23% detection error achieved by balanced model
Speaker ID
- Add VAD to system
- GMM-based test program delivered
- GMM registration program done
Emotion detection
- Zhang Weiwei is learning the code
- Sinovoice is implementing the server
Text Processing
LM development
Domain specific LM
h2. ngram generation is on going h2. look the memory and baidu_hi done
h2. NUM tag LM:
- maxi work is released.
- yuanbin continue the tag lm work.
- add the ner to tag lm .
- Boost specific words like wifi if TAG model does not work for a particular word.
Word2Vector
W2V based doc classification
- Initial results variable Bayesian GMM obtained. Performance is not as good as the conventional GMM.
- Non-linear inter-language transform: English-Spanish-Czch: wv model training done, transform model on investigation
- SSA-based local linear mapping still on running.
- k-means classes change to 2.
- Knowledge vector started
- document obtained from wiki
- formula obtained
- Character to word conversion
- read more paper .
- prepare to train .
- Google word vector train
- improve the sampling method
RNN LM
- Prepare WSJ database
- Trained model 10000 x 4 + 320 + 10000
- Better performance obtained (4.16-3.47)
- gigaword sampling for Chinese data
Translation
- v3.0 demo released
- still slow
- cut the vocabulary that is not important .
QA
- liangshan_v1 performance 74.3%.
- New framework and GA method is done
- add SEMPRE tool to framework