“Sinovoice-2014-04-15”版本间的差异
来自cslt Wiki
(以内容“=Environment setting= * Sinovoice internal server deployment. Now a better approach by using Gitlab + Redmine. * Delivered part of the Kaldi code. Some fix yet waitin...”创建新页面) |
|||
(相同用户的2个中间修订版本未显示) | |||
第3行: | 第3行: | ||
* Sinovoice internal server deployment. Now a better approach by using Gitlab + Redmine. | * Sinovoice internal server deployment. Now a better approach by using Gitlab + Redmine. | ||
* Delivered part of the Kaldi code. Some fix yet waiting for check in. | * Delivered part of the Kaldi code. Some fix yet waiting for check in. | ||
− | + | * Email notification is problematic. Need obtain a smtp server | |
=Corpora= | =Corpora= | ||
第10行: | 第10行: | ||
* Now totally 1338h (470 + 346 + 105BJ mobile + 200 PICC + 108h HBTc + 109h New BJ mobile) telephone speech is ready. | * Now totally 1338h (470 + 346 + 105BJ mobile + 200 PICC + 108h HBTc + 109h New BJ mobile) telephone speech is ready. | ||
* 16k 6000h data: 978h online data from DataTang + 656h online mobile data + 4300h recording data. | * 16k 6000h data: 978h online data from DataTang + 656h online mobile data + 4300h recording data. | ||
− | |||
* Standard established for LM-speech-text labeling (speech data transcription for LM enhancement) | * Standard established for LM-speech-text labeling (speech data transcription for LM enhancement) | ||
* Xiaona will prepare noise database. Start from telephone speech. | * Xiaona will prepare noise database. Start from telephone speech. | ||
第22行: | 第21行: | ||
*Baseline: 8k states, 470+300 MPE4, 20.29 | *Baseline: 8k states, 470+300 MPE4, 20.29 | ||
* Jietong phone, 200 hour seed, 10k states training: | * Jietong phone, 200 hour seed, 10k states training: | ||
− | |||
:* Xent 16 iteration: 22.90 | :* Xent 16 iteration: 22.90 | ||
− | + | :* MPE1 : 20.89 | |
* CSLT phone, 8k states training | * CSLT phone, 8k states training | ||
第40行: | 第38行: | ||
:* MPE1: 9.21 | :* MPE1: 9.21 | ||
:* MPE2: 9.13 | :* MPE2: 9.13 | ||
+ | |||
* 6000h/jt phone set phone set training | * 6000h/jt phone set phone set training | ||
− | :* | + | :* running into MPE1. |
第56行: | 第55行: | ||
===Multilanguage Training=== | ===Multilanguage Training=== | ||
− | * Prepare Chinglish data: | + | * Prepare Chinglish data: contacted with a vendor for 1000 hour mobile recording. Will check how much we need |
+ | * AMIDA database downloading | ||
+ | * Build a baseline system | ||
* Prepare shared DNN structure for multilingual training | * Prepare shared DNN structure for multilingual training | ||
第62行: | 第63行: | ||
* GFbank can be propagated to Sinovoice | * GFbank can be propagated to Sinovoice | ||
+ | :* Let Mengyuan prepare the experiments | ||
* Liuchao will prepare fast computing code | * Liuchao will prepare fast computing code | ||
第69行: | 第71行: | ||
* Training process was delivered. | * Training process was delivered. | ||
* Problems in encoding were solved. | * Problems in encoding were solved. | ||
− | * Initial CSLT LM buildup completed | + | * Initial CSLT LM buildup completed. |
==Domain specific atom-LM construction== | ==Domain specific atom-LM construction== | ||
第79行: | 第81行: | ||
===Text data filtering=== | ===Text data filtering=== | ||
+ | * Prepare word list | ||
* VSM-based topic segmentation was delivered to Sinovoice, but the tool is highly inefficient. | * VSM-based topic segmentation was delivered to Sinovoice, but the tool is highly inefficient. | ||
* An enhanced toolkit was delivered. | * An enhanced toolkit was delivered. | ||
− | + | * A telecom specific word list is ready, several stop words are ready | |
=DNN Decoder= | =DNN Decoder= |
2014年4月15日 (二) 08:26的最后版本
Environment setting
- Sinovoice internal server deployment. Now a better approach by using Gitlab + Redmine.
- Delivered part of the Kaldi code. Some fix yet waiting for check in.
- Email notification is problematic. Need obtain a smtp server
Corpora
- 300h Guangxi telecom text transcription prepared. 150h before 18th, April.
- Now totally 1338h (470 + 346 + 105BJ mobile + 200 PICC + 108h HBTc + 109h New BJ mobile) telephone speech is ready.
- 16k 6000h data: 978h online data from DataTang + 656h online mobile data + 4300h recording data.
- Standard established for LM-speech-text labeling (speech data transcription for LM enhancement)
- Xiaona will prepare noise database. Start from telephone speech.
Acoustic modeling
Telephone model training
1000h Training
- Baseline: 8k states, 470+300 MPE4, 20.29
- Jietong phone, 200 hour seed, 10k states training:
- Xent 16 iteration: 22.90
- MPE1 : 20.89
- CSLT phone, 8k states training
- MPE1: 20.60
- MPE2: 20.37
- MPE3: 20.37
- MPE4: 20.37
6000 hour 16k training
Training progress
- 6000h/CSLT phone set training
- Xent: 12.83
- MPE1: 9.21
- MPE2: 9.13
- 6000h/jt phone set phone set training
- running into MPE1.
Train Analysis
- The Qihang model used a subset of the 6k data
- 2500+950H+tang500h*+20131220, approximately 1700+2400 hours
- GMM training using this subset achieved 22.47%. Xiaoming's result is 16.1%.
- Seems the database is still not very consistent
- Xiaoming kicked off the job to reproduce the Qihang training using this subset
Multilanguage Training
- Prepare Chinglish data: contacted with a vendor for 1000 hour mobile recording. Will check how much we need
- AMIDA database downloading
- Build a baseline system
- Prepare shared DNN structure for multilingual training
Noise robust feature
- GFbank can be propagated to Sinovoice
- Let Mengyuan prepare the experiments
- Liuchao will prepare fast computing code
Language modeling
Training recipe transfer
- Training process was delivered.
- Problems in encoding were solved.
- Initial CSLT LM buildup completed.
Domain specific atom-LM construction
Some potential problems
- Unclear domain definition
- Using the same development set (8k transcription) is not very appropriate
Text data filtering
- Prepare word list
- VSM-based topic segmentation was delivered to Sinovoice, but the tool is highly inefficient.
- An enhanced toolkit was delivered.
- A telecom specific word list is ready, several stop words are ready
DNN Decoder
decoder optimization
- Test computation cost of each step
- beam 9/5000: netforward 65%
- beam 13/7000: netforward 28%
- Sinovoice change in Kaldi delivered and ready to check-in
- Need to verify the speed of the CSLT engine
Frame-skipping
- Zhiyong & Liuchao will deliver the frame-skipping approach.
BigLM optimization
- Investigate BigLM retrieval optimization.