Cslt：以内容“==Resoruce Building== * Current text resource has been re-arranged and listed == Leftover questions== * Asymmetric window: Great improvement on training set(WER 34% to...”创建新页面

2014-04-11T02:13:59Z

以内容“==Resoruce Building== * Current text resource has been re-arranged and listed == Leftover questions== * Asymmetric window: Great improvement on training set(WER 34% to...”创建新页面

新页面

==Resoruce Building==
* Current text resource has been re-arranged and listed

== Leftover questions==
* Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test. Overfitting?
* Multi GPU training: Error encountered
* Multilanguage training
* Investigating LOUDS FST.
* CLG embedded decoder plus online compiler.
* DNN-GMM co-training

== AM development ==

=== Sparse DNN ===
* GA-based block sparsity
:* Found a paper in 2000 with similar ideas.
:* Try to get a student working on high performance computing to do the optimization

===Noise training===
:* More experiments with no-noise
:* More experiments with additional noise types

===AMR compression re-training===

* 1700h MPE adaptation done
* 1700h stream mode adaptation runs into MPE1

===GFbank===
* Significant improvement found with GFBank
* Significant improvement found with FBank + GFBank

===Denoising & Farfield ASR===
* Recording done
* Prepare to construct the baseline

===VAD===

* Code ready, need to figure out speech/no-speech smooth

===Farfield recognition===

===Scoring===

* g-score based on MLP is done
* t-score based on linear regression improves the performance

==Word to Vector==

* LDA baseline (sogou 1700*9 training set) done
* Wordvector classification is much better than the LDA system
<pre>
word vector:
general: dict - 15w; train_data - ren_ming_ri_bao(5g); windows-5
1. size - 50 time=30m 12thread
2. size - 10 time=10m 12thread

data: class_num=9 document_num=9*2000
train_num =9*1600
test_num =9*200
dev_num =9*200

train_set:
C000008 C000010 C000013 C000014 C000016 C000020 C000022 C000023 C000024 total
财经 IT 健康体育旅游教育招聘文化军事
lda_inf 0.845 0.2756 0.698 0.9502 0.63499 0.32 0.8080 0.3505 0.864 0.6385
lda_inf_10 0.8149 0.0887 0.628 0.9641 0.5739 0.105 0.707363 0.2334 0.8628 0.553167
w2v_filter_filer 0.7463 0.713 0.657 0.9106 0.68659 0.54 0.74638 0.692 0.84518 0.72638
w2v_filter_filer_10 0.7608 0.4323 0.57394 0.865 0.549 0.335 0.577 0.6129 0.78099 0.609769

test_set:
C000008 C000010 C000013 C000014 C000016 C000020 C000022 C000023 C000024 total
财经 IT 健康体育旅游教育招聘文化军事
w2v_filter_filter 0.6865 0.7263 0.6716 0.84577 0.7462 0.46268 0.6567 0.7114 0.8905 0.71088
w2v_filter_filter_10 0.791 0.4079 0.56218 0.74129 0.62189 0.22885 0.562 0.6766 0.84079 0.603648
lda_inf 0.8706 0.26368 0.6965 0.8009 0.582 0.2537 0.72139 0.3184 0.82587 0.59259
lda_inf_10 0.776 0.1044 0.6467 0.9054 0.62189 0.1144 0.56218 0.24378 0.796 0.530127

note:w2v_filter--remove the stop word in traing word vector
note:w2v_filter_filter -- remove the stop word in traing word vector and remove the documnet stop words
</pre>

==LM development==

===NN LM===

* Character-based NNLM (6700 chars, 7gram), 500M data training done.
:* Non-boundary char LM is better than boundary char LM

* Investigate MS RNN LM training

==QA==

===FST-based matching===
:* Word-based FST 1-2 seconds with 1600 patterns. Huilan's implementation <1 second. ?????
:* Char-FST Implementation is done.

===Speech QA===
* Investigate determinization of G embedding

2014-04-11 - 版本历史

Cslt：以内容“==Resoruce Building== * Current text resource has been re-arranged and listed == Leftover questions== * Asymmetric window: Great improvement on training set(WER 34% to...”创建新页面