“Sinovoice-2016-6-2”版本间的差异
来自cslt Wiki
(→Problem) |
(→Problem) |
||
(相同用户的3个中间修订版本未显示) | |||
第17行: | 第17行: | ||
* PingAn | * PingAn | ||
:*100h User data done | :*100h User data done | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
===Big-Model Training=== | ===Big-Model Training=== | ||
第102行: | 第95行: | ||
===Problem=== | ===Problem=== | ||
* Deletion error | * Deletion error | ||
− | :* Solved by | + | :* Solved by adding a noise phone(spn) to alleviate the silence over-training. |
+ | :* Need to re-train our current model. | ||
* TDNN silence scale is too sensitive for different test cases. | * TDNN silence scale is too sensitive for different test cases. | ||
:* After adding "spn", maybe there's no need to tune silence-scale so carefully. | :* After adding "spn", maybe there's no need to tune silence-scale so carefully. | ||
− | * | + | * Stream-mode decoding(cmvn) causes performance reduction. |
==SiaSun Robot== | ==SiaSun Robot== |
2016年6月2日 (四) 05:11的最后版本
目录
Data
- 16K LingYun
- 2000h data ready
- 4300h real-env data to label
- YueYu
- Total 250h(190h-YueYu + 60h-English)
- Add 60h YueYu
- CER: 75%->76%
- WeiYu
- 8k more data
- 50h for training
- 120h labeled ready
- PingAn
- 100h User data done
Big-Model Training
16k
- Done!
8k Project
- PingAn
========================================================================================= | AM / config | all | KeHu wer || KeHu no-ins | ----------------------------------------------------------------------------------------- | tdnn 7-2048 xEnt | 16.45 | 36.49 || 25.18 | | tdnn 7-2048 MPE | 15.22 | 32.77 || 23.48 | | tdnn 7-2048 MPE adapt-PABX | 14.67 | 31.33 || 22.76 | ----------------------------------------------------------------------------------------- | tdnn 7-1024 xEnt | 16.60 | 35.91 || 25.58 | | tdnn 7-1024 MPE 2e-6 | 15.67 | 32.77 || 26.09 | | tdnn 7-1024 MPE 2e-5 1.mdl | 15.54 | 32.77 || 26.29 | | tdnn 7-1024 MPE 1e-5 4.mdl | 15.76 | 33.55 || 27.20 | | tdnn 7-1024 MPE adapt-PABX | 14.80 | 30.48 || 22.56 | ----------------------------------------------------------------------------------------- | spn 7-1024 xEnt | 16.49 | 36.23 || 24.59 | | spn 7-1024 xEnt xEnt-PA_user 101.mdl| 16.19 | 33.22 || 22.69 | | spn 7-1024 xEnt xEnt-PA_user mpe | 15.24 | 32.77 || 21.65 | | spn 7-1024 MPE-1000H 23.mdl | 15.29 | 33.09 || 21.65 | | spn 7-1024 MPE adapt-PA_all 29.mdl | 15.11 | 33.42 || 21.84 | | spn 7-1024 MPE adapt-PA_user 2e-5 | 15.31 | 31.79 || 20.14 | | spn 7-1024 MPE adapt-PA_user Hs 2e-5| 15.32 | 32.24 || 20.93 | =========================================================================================
===================================================================================================== | LM / config | KeHu |KeHu check_zxm_recheck| KeHu final | ----------------------------------------------------------------------------------------------------- | baseline | 20.14 | 19.40 | 18.26 | ----------------------------------------------------------------------------------------------------- |bank+baoxian.chart+word.w0.9 | - | 19.27 | 17.88 | |bank+baoxian+guojiadianwang.chart+word.w0.9| - | 19.02 | 17.94 | |bank+baoxian+guojiadianwang_w0.9 | - | 19.02 | - | |bank+baoxian_w0.9 | - | 19.20 | - | |baoxian+bank_w0.9 | - | 19.02 | - | |baoxian+user200h.chart+word.w0.9 | - | 19.20 | - | |baoxian+user200h_w0.8 | - | 19.02 | - | |baoxian+user200h_w0.9 | - | 19.08 | 18.01 | |baoxian+user200h.w0.9w0.9 | - | 19.08 | - | |baoxian+user200h.chart1e-7.w0.9w0.1 | - | - | 23.82 | |baoxian+user200h.chart1e-7.w0.9w0.9 | - | - | 18.26 | =====================================================================================================
- LiaoNingYiDong:
- Done.
Embedding
- Finish model training. performance/speed/size all looks ok.
- Need further test on embedded device.
Character LM
- Except Sogou-2T, 9-gram has been done.
- Add word boundary tag to Character-LM trainig done
- 9-gram
- Except Weibo & Sogou-2T
- 1e-7(13M) wer17.91 compared with 1e-7(no-boundary,71M) 13.4
- 1e-8(54M) wer17.54
- Prepare specific domain vocabulary
- Dianxin/Baoxian/Dianli
- DT lm training
- ReFr
- Merge Character-LM & word-LM
- Union
- Compose, success.
- 2-step decoding: first, character-based LM. Then, word-based LM.
Problem
- Deletion error
- Solved by adding a noise phone(spn) to alleviate the silence over-training.
- Need to re-train our current model.
- TDNN silence scale is too sensitive for different test cases.
- After adding "spn", maybe there's no need to tune silence-scale so carefully.
- Stream-mode decoding(cmvn) causes performance reduction.
SiaSun Robot
- Beam-forming algorithm test
- NN-model based beam-forming
SID
Digit
- Engine Package