“Hulan-2013-11-30”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
TTS
 
(相同用户的3个中间修订版本未显示)
第3行: 第3行:
 
==ASR Kernel development==
 
==ASR Kernel development==
  
[[http://cslt.riit.tsinghua.edu.cn/mediawiki/index.php/2013-11-30 ASR group weekly report]]
+
[[http://cslt.riit.tsinghua.edu.cn/mediawiki/index.php/2013-11-29 ASR group weekly report]]
  
 
==TTS==
 
==TTS==
  
* 5000 utterance training done.
+
* This week
* 500 utterance TN recording done. Quality control is not very good. Resulting synthesis is not satisfied.  
+
:* 5000 utterance training done.
* 41 WD utterance recording. Quality control fine. Adaptation done. Sounds OK.
+
:* 500 utterance TN recording done. Quality control is not very good. Resulting synthesis is not satisfied.  
* Buzzy sound was investigated.The main source is the source model (excitation).  STRAIGHT sounds better.
+
:* 41 WD utterance recording. Quality control fine. Adaptation done. Sounds OK.
* Developing CGI service.
+
:* Buzzy sound was investigated.The main source is the source model (excitation).  STRAIGHT sounds better.
 +
 
 +
* Next week
 +
:* Developing CGI service.
 +
:* Prepare to 2000 utt female record.
  
 
=Dialog system=
 
=Dialog system=
 +
 +
==Statistical approach==
  
 
* HowNET information extracted. Coverage is limited. 50k words.  
 
* HowNET information extracted. Coverage is limited. 50k words.  

2013年11月29日 (五) 04:00的最后版本

ASR

ASR Kernel development

[ASR group weekly report]

TTS

  • This week
  • 5000 utterance training done.
  • 500 utterance TN recording done. Quality control is not very good. Resulting synthesis is not satisfied.
  • 41 WD utterance recording. Quality control fine. Adaptation done. Sounds OK.
  • Buzzy sound was investigated.The main source is the source model (excitation). STRAIGHT sounds better.
  • Next week
  • Developing CGI service.
  • Prepare to 2000 utt female record.

Dialog system

Statistical approach

  • HowNET information extracted. Coverage is limited. 50k words.
  • Custom task error analysis

There are totally 2000 errors. Investigated into 600 errors.

  • NULL query, 1.4%
  • English upper/lower mismatch. 1.6%
  • Traditional/Simple Chinese mismatch. 2.2%
  • High frequency of sub-important words, like taxing. 1.3%
  • Database labeling error (matched query is better than the labeled correct query). 21.8%
  • Stand query or query involve many unimportant words, leading to less TF/IDF. STOP words still impact. 10.7%
  • TF/IDF incorrectly weighted the matched terms. 3.9%
  • Synonym can not match. 36.5%
  • Category words can not match. 13.5%
  • Answer label incorrect. Semantic relationship missing. 6.8%
  • Word segmentation hide keywords. 4%
  • Vague query. None discriminative words after stop words purging. 1.6%

Template matching

  • Work on disambiguousness
  • Completed the prototype design for the QA logic
  • Develop an assistant tool for source code generation