“Sinovoice-2014-03-18”版本间的差异
来自cslt Wiki
(→Corpora) |
(→Corpora) |
||
(相同用户的4个中间修订版本未显示) | |||
第7行: | 第7行: | ||
* PICC data are done (200h). | * PICC data are done (200h). | ||
* Huibei telecom data are done (108h). | * Huibei telecom data are done (108h). | ||
− | * Now totally 1229h (470 + 346 + 105BJ mobile + 200 PICC + 108h) telephone speech is ready. | + | * Now totally 1229h (470 + 346 + 105BJ mobile + 200 PICC + 108h HBTc) telephone speech is ready. |
* 16k 6000h data: 978h online data from DataTang + 656h online mobile data + 4300h recording data. | * 16k 6000h data: 978h online data from DataTang + 656h online mobile data + 4300h recording data. | ||
* LM corpus preparation done. | * LM corpus preparation done. | ||
第32行: | 第32行: | ||
* 6000h/CSLT phone set alignment/denlattice completed | * 6000h/CSLT phone set alignment/denlattice completed | ||
* 6000h/jt phone set alignment/denlattice completed | * 6000h/jt phone set alignment/denlattice completed | ||
+ | * MPE is kicked off | ||
第39行: | 第40行: | ||
* GMM training using this subset achieved 22.47%. Xiaoming's result is 16.1%. | * GMM training using this subset achieved 22.47%. Xiaoming's result is 16.1%. | ||
− | :* Seems the database is not very consistent | + | :* Seems the database is still not very consistent |
− | :* Xiaoming will try to reproduce the Qihang training using | + | :* Xiaoming will try to reproduce the Qihang training using this subset |
− | * | + | * Tested the 1700h model and 6000h model on the T test sets |
<pre> | <pre> | ||
第52行: | 第53行: | ||
</pre> | </pre> | ||
− | :* 6000h | + | :* 6000h model is general better than the 1700h for careful reading or domain specific recording |
− | :* 6000h with MPE/jt | + | :* 6000h with MPE/jt phone set is still on training, but better performance is expected |
− | :* Suggest test the 6000 model on jidong | + | :* This indicates that we should prepare domain-specific AM (not only 8k/16k). The online test prefers online training data |
− | + | :* Suggest test the 6000 model on jidong data | |
− | + | ||
===Hubei telecom=== | ===Hubei telecom=== | ||
− | + | * Incremental training with Hubei telecome data based on the (470+300+BJmobile) model. MPE4 finished | |
− | + | :* The original model: 27.30, | |
− | + | :* The adapted model: 25.42 | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | * Incremental training with Hubei telecome based on the | + | |
− | :* | + | |
− | + | ||
− | + | ||
=Language modeling= | =Language modeling= | ||
* Training data ready | * Training data ready | ||
− | * | + | * First focus on PICC test set, try to improve the PPL |
− | + | ||
=DNN Decoder= | =DNN Decoder= |
2014年3月21日 (五) 08:09的最后版本
目录
Environment setting
- Raid215 is a bit slow. Move some den-lattice and alignment to Raid212.
Corpora
- PICC data are done (200h).
- Huibei telecom data are done (108h).
- Now totally 1229h (470 + 346 + 105BJ mobile + 200 PICC + 108h HBTc) telephone speech is ready.
- 16k 6000h data: 978h online data from DataTang + 656h online mobile data + 4300h recording data.
- LM corpus preparation done.
Acoustic modeling
Telephone model training
1000h Training
- Xent completed. Compiling lattices.
- Need to test the xEnt performance
PICC dedicated training
- Need to collect financial text data and retrain the LM
- Need to comb word list and training text
6000 hour 16k training
Training progress
- 6000h/CSLT phone set alignment/denlattice completed
- 6000h/jt phone set alignment/denlattice completed
- MPE is kicked off
Train Analysis
- The Qihang model used a subset of the 6k data
- 2500+950H+tang500h*+20131220, approximately 1700+2400 hours
- GMM training using this subset achieved 22.47%. Xiaoming's result is 16.1%.
- Seems the database is still not very consistent
- Xiaoming will try to reproduce the Qihang training using this subset
- Tested the 1700h model and 6000h model on the T test sets
model/testcase | ditu | due1| entity1 | rec1 | shiji | zaixian1 | zaixian2 | kuaisu ------------------------------------------------------------------------------------------------ 1700h_mpe | 12.18 | 12.93 | 5.29 | 3.69 | 21.73 | 25.38 | 19.45 | 12.50 ------------------------------------------------------------------------------------------------ 6000h_xEnt | 11.13 | 10.12 | 4.64 | 2.80 | 17.67 | 27.45 | 23.23 | 10.98
- 6000h model is general better than the 1700h for careful reading or domain specific recording
- 6000h with MPE/jt phone set is still on training, but better performance is expected
- This indicates that we should prepare domain-specific AM (not only 8k/16k). The online test prefers online training data
- Suggest test the 6000 model on jidong data
Hubei telecom
- Incremental training with Hubei telecome data based on the (470+300+BJmobile) model. MPE4 finished
- The original model: 27.30,
- The adapted model: 25.42
Language modeling
- Training data ready
- First focus on PICC test set, try to improve the PPL
DNN Decoder
Online decoder adaptation
- Finished alignment/den-lattice
- 1st round MPE training on going, 2 days/iteration