“2013-09-27”版本间的差异
来自cslt Wiki
(→FBank features) |
(→Continuous LM) |
||
(相同用户的6个中间修订版本未显示) | |||
第8行: | 第8行: | ||
* Optimal Brain Damage based sparsity is on going. Prepare the algorithm. | * Optimal Brain Damage based sparsity is on going. Prepare the algorithm. | ||
− | * An interesting investigation is drop-out 50% weights after each iteration, and then re-training without sticky. | + | * An interesting investigation is drop-out 50% weights after each iteration, and then re-training without sticky. The performance is a bit better than the original best. This might be attributed to some noisy turbulence that provides some change out of local minimum. |
Report on [http://cslt.riit.tsinghua.edu.cn/mediawiki/index.php/文件:Chart1.png here] | Report on [http://cslt.riit.tsinghua.edu.cn/mediawiki/index.php/文件:Chart1.png here] | ||
第14行: | 第14行: | ||
=== FBank features === | === FBank features === | ||
− | 1000 hour testing | + | 1000 hour testing is done. The performance is significantly better than the MFCC. And the iteration 14 is better than the final iteration. This may be attributed to some over-fitting. |
+ | |||
+ | [http://cslt.riit.tsinghua.edu.cn/mediawiki/index.php/%E6%96%87%E4%BB%B6:Chart2.png click here] | ||
=== Tencent exps === | === Tencent exps === | ||
第24行: | 第26行: | ||
Sample noise segments randomly for each utterance. Using Dirichlet to sample noise distribution on various types, and use Gaussian to sample SNR. | Sample noise segments randomly for each utterance. Using Dirichlet to sample noise distribution on various types, and use Gaussian to sample SNR. | ||
− | + | The first initial test involves white noise and car noise are 1/3 respectively. The performance report is here: | |
− | [http:// | + | [http://cslt.riit.tsinghua.edu.cn/mediawiki/index.php/%E6%96%87%E4%BB%B6:Chart3.png click here] |
The conclusions is that: | The conclusions is that: | ||
− | + | # by sampling noises, most of the noise patterns can be learned efficiently and thus improve performance on noisy test data. | |
− | + | # by sampling noises with high variance, performance on clean speech is largely remained. | |
==Continuous LM == | ==Continuous LM == | ||
第37行: | 第39行: | ||
1. SogouQ n-gram building: 500M text data, 110k words. Two tests: | 1. SogouQ n-gram building: 500M text data, 110k words. Two tests: | ||
− | (1) using Tencent online1 and online2 transcription: online1 1651 online2: 1512 | + | (1) using Tencent online1 and online2 transcription: online1: 1651 online2: 1512 |
(2) using 70k sogouQ test set : ppl 33 | (2) using 70k sogouQ test set : ppl 33 | ||
− | This means the SogouQ text is significantly different from the online1 and online2 Tencent set, due to the | + | This means the SogouQ text is significantly different from the online1 and online2 Tencent set, due to the different domain. |
2. NN LM | 2. NN LM | ||
− | Using 11k words as input, 192 hidden layer. 500M text data from QA data. | + | Using 11k words as the input, 192 units in the hidden layer. 500M text data from QA data. Test with online2 transcription. |
− | (1) | + | (1) Predict the most frequent 1-1024 words with the NN LM, and others predicted by 4-gram. n-gram baseline: 402.37; NN+ngram: 122.54 |
− | (2) | + | (2) Predict the most frequent 1-2048 words with the NN LM, and others predicted by 4-gram. n-gram baseline: 402.37; NN+ngram: 127.59 |
− | (3) | + | (3) Predict the most frequent 1024-2048 words with the NN LM, and others predicted by 4-gram. n-gram baseline: 402.37; NN+ngram: 118.92 |
− | Conclusions: NN LM is extremely good than n-gram, due to its smooth capacity. | + | Conclusions: NN LM is extremely good than n-gram, due to its smooth capacity.It seems it helps more for the not-very-frequent words, which verifies its capability in smoothing. |
2013年9月29日 (日) 15:32的最后版本
目录
Data sharing
- LM count files still undelivered!
DNN progress
Sparse DNN
- Optimal Brain Damage based sparsity is on going. Prepare the algorithm.
- An interesting investigation is drop-out 50% weights after each iteration, and then re-training without sticky. The performance is a bit better than the original best. This might be attributed to some noisy turbulence that provides some change out of local minimum.
Report on here
FBank features
1000 hour testing is done. The performance is significantly better than the MFCC. And the iteration 14 is better than the final iteration. This may be attributed to some over-fitting.
Tencent exps
N/A
Noisy training
Sample noise segments randomly for each utterance. Using Dirichlet to sample noise distribution on various types, and use Gaussian to sample SNR.
The first initial test involves white noise and car noise are 1/3 respectively. The performance report is here:
The conclusions is that:
- by sampling noises, most of the noise patterns can be learned efficiently and thus improve performance on noisy test data.
- by sampling noises with high variance, performance on clean speech is largely remained.
Continuous LM
1. SogouQ n-gram building: 500M text data, 110k words. Two tests:
(1) using Tencent online1 and online2 transcription: online1: 1651 online2: 1512 (2) using 70k sogouQ test set : ppl 33
This means the SogouQ text is significantly different from the online1 and online2 Tencent set, due to the different domain.
2. NN LM
Using 11k words as the input, 192 units in the hidden layer. 500M text data from QA data. Test with online2 transcription.
(1) Predict the most frequent 1-1024 words with the NN LM, and others predicted by 4-gram. n-gram baseline: 402.37; NN+ngram: 122.54 (2) Predict the most frequent 1-2048 words with the NN LM, and others predicted by 4-gram. n-gram baseline: 402.37; NN+ngram: 127.59 (3) Predict the most frequent 1024-2048 words with the NN LM, and others predicted by 4-gram. n-gram baseline: 402.37; NN+ngram: 118.92
Conclusions: NN LM is extremely good than n-gram, due to its smooth capacity.It seems it helps more for the not-very-frequent words, which verifies its capability in smoothing.