“Hulan-2013-11-30”版本间的差异
来自cslt Wiki
(以内容“=ASR= ==ASR Kernel development== http://cslt.riit.tsinghua.edu.cn/mediawiki/index.php/2013-11-30 ASR group weekly report ==TTS== * 5000 utterance done. * 500 u...”创建新页面) |
|||
(相同用户的7个中间修订版本未显示) | |||
第3行: | 第3行: | ||
==ASR Kernel development== | ==ASR Kernel development== | ||
− | [[http://cslt.riit.tsinghua.edu.cn/mediawiki/index.php/2013-11- | + | [[http://cslt.riit.tsinghua.edu.cn/mediawiki/index.php/2013-11-29 ASR group weekly report]] |
==TTS== | ==TTS== | ||
− | * 5000 utterance done. | + | * This week |
− | * 500 utterance TN recording. Quality | + | :* 5000 utterance training done. |
− | * 41 WD utterance recording. Quality control fine. Adaptation done. | + | :* 500 utterance TN recording done. Quality control is not very good. Resulting synthesis is not satisfied. |
− | * Buzzy sound was investigated.The main source is the source model (excitation). STRAIGHT sounds better. | + | :* 41 WD utterance recording. Quality control fine. Adaptation done. Sounds OK. |
+ | :* Buzzy sound was investigated.The main source is the source model (excitation). STRAIGHT sounds better. | ||
+ | * Next week | ||
+ | :* Developing CGI service. | ||
+ | :* Prepare to 2000 utt female record. | ||
=Dialog system= | =Dialog system= | ||
+ | |||
+ | ==Statistical approach== | ||
* HowNET information extracted. Coverage is limited. 50k words. | * HowNET information extracted. Coverage is limited. 50k words. | ||
第34行: | 第40行: | ||
==Template matching== | ==Template matching== | ||
− | * | + | * Work on disambiguousness |
+ | * Completed the prototype design for the QA logic | ||
+ | * Develop an assistant tool for source code generation |
2013年11月29日 (五) 04:00的最后版本
目录
ASR
ASR Kernel development
TTS
- This week
- 5000 utterance training done.
- 500 utterance TN recording done. Quality control is not very good. Resulting synthesis is not satisfied.
- 41 WD utterance recording. Quality control fine. Adaptation done. Sounds OK.
- Buzzy sound was investigated.The main source is the source model (excitation). STRAIGHT sounds better.
- Next week
- Developing CGI service.
- Prepare to 2000 utt female record.
Dialog system
Statistical approach
- HowNET information extracted. Coverage is limited. 50k words.
- Custom task error analysis
There are totally 2000 errors. Investigated into 600 errors.
- NULL query, 1.4%
- English upper/lower mismatch. 1.6%
- Traditional/Simple Chinese mismatch. 2.2%
- High frequency of sub-important words, like taxing. 1.3%
- Database labeling error (matched query is better than the labeled correct query). 21.8%
- Stand query or query involve many unimportant words, leading to less TF/IDF. STOP words still impact. 10.7%
- TF/IDF incorrectly weighted the matched terms. 3.9%
- Synonym can not match. 36.5%
- Category words can not match. 13.5%
- Answer label incorrect. Semantic relationship missing. 6.8%
- Word segmentation hide keywords. 4%
- Vague query. None discriminative words after stop words purging. 1.6%
Template matching
- Work on disambiguousness
- Completed the prototype design for the QA logic
- Develop an assistant tool for source code generation