“Sinovoice-2016-5-5”版本间的差异
来自cslt Wiki
(→Big-Model Training) |
|||
(1位用户的2个中间修订版本未显示) | |||
第74行: | 第74行: | ||
* 8k | * 8k | ||
− | + | PingAnAll: | |
− | =============================================================================== | + | ================================================================================== |
− | | | + | | AM / error | tot_err | ins | del | sub | wer | |
− | ------------------------------------------------------------------------------- | + | ---------------------------------------------------------------------------------- |
− | | tdnn 7- | + | | tdnn 7-1024 xEnt 2500.mdl | 3626 | 619 | 773 | 2234 | 16.60 | |
− | | | + | ---------------------------------------------------------------------------------- |
− | + | | spn 7-1024 xEnt 300.mdl | 3746 | 702 | 763 | 2281 | 17.15 | | |
− | + | ================================================================================== | |
− | | | + | |
− | | | + | |
− | + | ||
− | =============================================================================== | + | |
+ | PingAnUser: | ||
+ | ================================================================================== | ||
+ | | AM / error | tot_err | ins | del | sub | wer | | ||
+ | ---------------------------------------------------------------------------------- | ||
+ | | tdnn 7-1024 xEnt 2500.mdl | 549 | 158 | 75 | 316 | 35.91 | | ||
+ | ---------------------------------------------------------------------------------- | ||
+ | | spn 7-1024 xEnt 300.mdl | 571 | 151 | 97 | 323 | 37.34 | | ||
+ | ================================================================================== | ||
− | + | LiaoNingYiDong: | |
− | ============================================================================== | + | ================================================================================== |
− | | | + | | AM / error | tot_err | ins | del | sub | wer | |
− | ------------------------------------------------------------------------------ | + | ---------------------------------------------------------------------------------- |
− | | tdnn 7- | + | | tdnn 7-1024 xEnt 2500.mdl | 5873 | 879 | 1364 | 3630 | 21.72 | |
− | | | + | ---------------------------------------------------------------------------------- |
− | + | | spn 7-1024 xEnt 300.mdl | 6257 | 977 | 1348 | 3923 | 23.14 | | |
− | + | ================================================================================== | |
− | | | + | |
− | | | + | |
− | + | ||
− | ============================================================================== | + | |
===Embedding=== | ===Embedding=== | ||
第105行: | 第105行: | ||
* 5*576-2400 TDNN model training done. AM size is about 17M | * 5*576-2400 TDNN model training done. AM size is about 17M | ||
* 5*500-2400 TDNN model on training. | * 5*500-2400 TDNN model on training. | ||
+ | * making lattice for MPE training. | ||
− | === | + | ===Character LM=== |
− | * | + | * Except Sogou-2T, 9-gram has been done. |
− | + | * Add word boundary tag to Character-LM trainig done | |
− | + | :* 9-gram | |
− | + | :* Except Weibo & Sogou-2T | |
− | + | * Prepare specific domain vocabulary | |
− | + | :* Dianxin/Baoxian/Dianli | |
− | * | + | *DT lm training |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
*Merge Character-LM & word-LM | *Merge Character-LM & word-LM | ||
:* Union | :* Union | ||
:* Compose, success. | :* Compose, success. | ||
* 2-step decoding: first, character-based LM. Then, word-based LM. | * 2-step decoding: first, character-based LM. Then, word-based LM. | ||
− | * | + | |
+ | ===SiaSong Robot=== | ||
+ | * Beam-forming algorithm test | ||
+ | * NN-model based beam-forming | ||
===Project=== | ===Project=== | ||
第133行: | 第132行: | ||
==SID== | ==SID== | ||
===Digit=== | ===Digit=== | ||
− | * | + | * Engine Package |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + |
2016年5月5日 (四) 06:35的最后版本
目录
Data
- 16K LingYun
- 2000h data ready
- 4300h real-env data to label
- YueYu
- Total 250h(190h-YueYu + 60h-English)
- Add 60h YueYu
- CER: 75%->76%
- WeiYu
- 50h for training
- 120h labeled ready
Model training
Deletion Error Promblem
- Add one noise phone to alleviate the silence over-training
- Omit sil accuracy in discriminative training
- H smoothing of XEnt and MPE
- Testdata: test_1000ju from 8000ju
----------------------------------------------------------------------------- model | ins | del | sub | wer/tot-err ----------------------------------------------------------------------------- svd600_lr2e-5_1000H_mpe_uv-fix | 24 | 56 | 408 | 8.26/488 ----------------------------------------------------------------------------- svd600_lr2e-5_1000H_mpe_uv-fix_omitsilacc | 32 | 48 | 409 | 8.28/489 ----------------------------------------------------------------------------- svd600_lr2e-5_1000H_mpe_uv-fix_xent0.1 | 24 | 57 | 406 | 8.24/487 ----------------------------------------------------------------------------- svd600_lr2e-5_1000H_mpe_uv-fix_xent0.2 | 25 | 60 | 409 | 8.36/494 -----------------------------------------------------------------------------
- Testdata: test_8000ju
----------------------------------------------------------------------------- model | ins | del | sub | wer/tot-err ----------------------------------------------------------------------------- svd600_lr2e-5_1000H_mpe_uv-fix | 140 | 562 | 3686 | 9.19/4388 | 47753-total-word ----------------------------------------------------------------------------- svd600_lr2e-5_1000H_mpe_uv-fix_xent0.1 | 146 | 510 | 3705 | 9.13/4361 ----------------------------------------------------------------------------- svd600_lr2e-5_1000H_mpe_uv-fix_xent0.2 | 139 | 492 | 3739 | 9.15/4370 -----------------------------------------------------------------------------
- Testdata: test_2000ju from 10000ju
----------------------------------------------------------------------------- model | ins | del | sub | wer/tot-err ----------------------------------------------------------------------------- svd600_lr2e-5_1000H_mpe_uv-fix | 86 | 790 | 1471 | 18.55/2347 ----------------------------------------------------------------------------- svd600_lr2e-5_1000H_mpe_uv-fix_omitsilacc | 256 | 473 | 1669 | 18.95/2398 ----------------------------------------------------------------------------- svd600_lr2e-5_1000H_mpe_uv-fix_xent0.1 | 95 | 704 | 1548 | 18.55/2347 ----------------------------------------------------------------------------- svd600_lr2e-5_1000H_mpe_uv-fix_xent0.2 | 100 | 697 | 1557 | 18.60/2354 -----------------------------------------------------------------------------
- Testdata: test_10000ju
----------------------------------------------------------------------------- model | ins | del | sub | wer/tot-err ----------------------------------------------------------------------------- svd600_lr2e-5_1000H_mpe_uv-fix | 478 | 3905 | 7698 | 18.31/12081 | 65989-total-word ----------------------------------------------------------------------------- svd600_lr2e-5_1000H_mpe_uv-fix_xent0.1 | 481 | 3741 | 7773 | 18.18/11995 ----------------------------------------------------------------------------- svd600_lr2e-5_1000H_mpe_uv-fix_xent0.2 | 502 | 3657 | 7826 | 18.16/11985 -----------------------------------------------------------------------------
- Add one silence arc from start-state to end-state
Big-Model Training
- 16k
- 8k
PingAnAll: ================================================================================== | AM / error | tot_err | ins | del | sub | wer | ---------------------------------------------------------------------------------- | tdnn 7-1024 xEnt 2500.mdl | 3626 | 619 | 773 | 2234 | 16.60 | ---------------------------------------------------------------------------------- | spn 7-1024 xEnt 300.mdl | 3746 | 702 | 763 | 2281 | 17.15 | ==================================================================================
PingAnUser: ================================================================================== | AM / error | tot_err | ins | del | sub | wer | ---------------------------------------------------------------------------------- | tdnn 7-1024 xEnt 2500.mdl | 549 | 158 | 75 | 316 | 35.91 | ---------------------------------------------------------------------------------- | spn 7-1024 xEnt 300.mdl | 571 | 151 | 97 | 323 | 37.34 | ==================================================================================
LiaoNingYiDong: ================================================================================== | AM / error | tot_err | ins | del | sub | wer | ---------------------------------------------------------------------------------- | tdnn 7-1024 xEnt 2500.mdl | 5873 | 879 | 1364 | 3630 | 21.72 | ---------------------------------------------------------------------------------- | spn 7-1024 xEnt 300.mdl | 6257 | 977 | 1348 | 3923 | 23.14 | ==================================================================================
Embedding
- The size of nnet1 AM is 6.4M (3M after decomposition). So we need to control AM size within 10M.
- 5*576-2400 TDNN model training done. AM size is about 17M
- 5*500-2400 TDNN model on training.
- making lattice for MPE training.
Character LM
- Except Sogou-2T, 9-gram has been done.
- Add word boundary tag to Character-LM trainig done
- 9-gram
- Except Weibo & Sogou-2T
- Prepare specific domain vocabulary
- Dianxin/Baoxian/Dianli
- DT lm training
- Merge Character-LM & word-LM
- Union
- Compose, success.
- 2-step decoding: first, character-based LM. Then, word-based LM.
SiaSong Robot
- Beam-forming algorithm test
- NN-model based beam-forming
Project
- Pingan & Yueyu Deletion error too more
- TDNN deletion error rate > DNN deletion error rate
- TDNN Silence scale is too sensitive for different test cases.
SID
Digit
- Engine Package