ASR

ASR Kernel development

5000 utterance training done.
500 utterance TN recording done. Quality control is not very good. Resulting synthesis is not satisfied.
41 WD utterance recording. Quality control fine. Adaptation done. Sounds OK.
Buzzy sound was investigated.The main source is the source model (excitation). STRAIGHT sounds better.

There are totally 2000 errors. Investigated into 600 errors.

NULL query, 1.4%
English upper/lower mismatch. 1.6%
Traditional/Simple Chinese mismatch. 2.2%
High frequency of sub-important words, like taxing. 1.3%
Database labeling error (matched query is better than the labeled correct query). 21.8%
Stand query or query involve many unimportant words, leading to less TF/IDF. STOP words still impact. 10.7%
TF/IDF incorrectly weighted the matched terms. 3.9%
Synonym can not match. 36.5%
Category words can not match. 13.5%
Answer label incorrect. Semantic relationship missing. 6.8%
Word segmentation hide keywords. 4%
Vague query. None discriminative words after stop words purging. 1.6%