ASR

ASR Kernel development

5000 utterance done.
500 utterance TN recording. Quality controlled failed. Resulting synthesis is unsatisfied.
41 WD utterance recording. Quality control fine. Adaptation done. Sound OK.
Buzzy sound was investigated.The main source is the source model (excitation). STRAIGHT sounds better.

There are totally 2000 errors. Investigated into 600 errors.

NULL query, 1.4%
English upper/lower mismatch. 1.6%
Traditional/Simple Chinese mismatch. 2.2%
High frequency of sub-important words, like taxing. 1.3%
Database labeling error (matched query is better than the labeled correct query). 21.8%
Stand query or query involve many unimportant words, leading to less TF/IDF. STOP words still impact. 10.7%
TF/IDF incorrectly weighted the matched terms. 3.9%
Synonym can not match. 36.5%
Category words can not match. 13.5%
Answer label incorrect. Semantic relationship missing. 6.8%
Word segmentation hide keywords. 4%
Vague query. None discriminative words after stop words purging. 1.6%