Cslt：/* Speech QA */

2014-03-14T02:38:30Z

‎Speech QA

2014年3月14日 (五) 02:38 Cslt

2014-03-14T02:38:06Z

Cslt：以内容“==Resoruce Building== * Current text resource has been re-arranged and listed == AM development == === Sparse DNN === * Optimal Brain Damage(OBD). # GA-based block...”创建新页面

2014-03-14T02:34:17Z

以内容“==Resoruce Building== * Current text resource has been re-arranged and listed == AM development == === Sparse DNN === * Optimal Brain Damage(OBD). # GA-based block...”创建新页面

新页面

==Resoruce Building==
* Current text resource has been re-arranged and listed

== AM development ==

=== Sparse DNN ===

* Optimal Brain Damage(OBD).

# GA-based block sparsity

=== Efficient DNN training ===

# Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test. Overfitting?

===Multi GPU training===
* Error encountered

===GMM - DNN co-training===
* Initial DNN test done
:* tri4b - > DNN (org)
:* DNN alignmenment -> tri4b
:* tri4b alignment -> DNN (re-train)

<pre>
model/testcase | test_dev93(cv) | test_eval92
--------------------------------------------------------------
8400-80000(org) | 7.41 | 4.13
--------------------------------------------------------------
re-train (Keep state #) | 7.20 | 4.24
--------------------------------------------------------------
re-train (Free state #) | 7.29 | 4.31
--------------------------------------------------------------
</pre>

=== Multilanguage training===

# Pure Chinese training reached 4.9%
# Chinese + English reduced to 7.9%
# English phone set should discriminate beginning phone and ending phone
# Should set up multilingual network structure which shares low layers but separate languages at high layers

===Noise training===

* Train with wsj database by corrupting data with various noise types
:* Almost all training conditions are completed
:* Interesting results in multi-conditional training (white + cafe) and test on park/station

===AMR compression re-training===
* WeChat uses AMR compression method, which requires adaptation for our AM
* Test AMR & non-AMR model

<pre>
model wav amr

xent baseline 4.47
wav_mpe baseline 4.20 36.77

amr_mpe_lr_1e-5 6.27 8.95
amr_mpe_lr_1e-4 7.58 8.68

amr_xEnt_lr_1e-5 6.89 7.99
amr_xEnt_lr_1e-4 6.61 7.28
amr_xEnt_lr_0.08 5.72 6.20

</pre>

* Prepare to do adaptation on 1700h
* Prepare to do mixing xEnt test

===GFbank===

* Finished the first round of gfbank training & test
* The same gmm model (mfcc feature) was used to get the alignment
* Traing fbank & gfbank based on the mfcc alignment
* Clean training and noise test

<pre>
clean 5dB 10dB 15dB 20dB 25dB
gfbank 4.22 73.03 39.20 16.41 8.36 5.60
gfbank_80 4.36 74.41 42.94 18.13 8.59 5.85
fbank_zmy 3.97 74.78 44.57 18.80 8.54 5.30
</pre>

* gfbank + fbank 80 dim training/test

===Engine optimization===

* Investigating LOUDS FST.

==Word to Vector==

* Improved wordvector with multi sense
:* Almost impossible with the toolkit
:* Can think of pre-training vectors and then do clusering

* WordVecteor-based keyword extraction
:* Prepared 7 category totally 500+ articles
:* A problem fixed to retrieve article words

* Wordvector based on classification

==LM development==

===NN LM===

* Character-based NNLM (6700 chars, 7gram), 500M data training done.
:* boundary-involved char NNLM training done
:* Test on rescoring

* Investigate MS RNN LM training

===3T Sogou LM===

*3T + tencent LM combination:
:* Combine the 3T voc (11w) and the tencent 8w voca
:* re-segmentation
:* compute PPL with the 3T and tencent LM
:* compute the best mixing weights
:* the mixing weight is wrong ....
:* if we mix the two by equal weight (0.5/0.5), performance is better than the individual

*3T + QA model combination

==QA Matching==

* FST-based matching
:* Investigating why openfST union does not lead to a determinizable graph
:* Test the pattern label

* TF/IDF weight
:* code is done, TF/IDF weight can be used right now.

==Embedded development==

* CLG embedded decoder is almost done. Online compiler is on progress.
* English scoring is under go

==Speech QA==

* N-best with entity LM was analyzed
:* WER vs QA accuracy is done
:* The figure shows that WER and QA accuracy is positively related
:* Addding song names and singer names improve performance in most cases
:* There indeed some exceptions in the figure that (a) higher WER does not reduce QA necessarily (b) adding entity names does not improve QA

*Class LM QA
* Use QA LM as the baseine
* Tag singer names and song names
* build tag LM
* Using graph integration to resolve the tags
* Adjusting in-tag weight
* Smaller weight produces more entity recognition
* Check if the recognized songs/singers are correct/wrong

<pre>
1, non-merge
BaseLine:
qa-singer-song
songs 41
singers 23

2, HCLG-merge
Weight means the multiplier of the sub-graph entry.
(1) LM:1e-5
weight 0.00000001 0.0001 0.001 0.01 1 10
songs 20 20 21 19 9 4
singers 13 13 13 13 2 2
</pre>

@@ 第151行： / 第151行： @@
 :* Addding song names and singer names improve performance in most cases
 :* There indeed some exceptions in the figure that (a) higher WER does not reduce QA necessarily (b) adding entity names does not improve QA
-:* The results on [[Music_QA_wer.pdf]]
+:* The results on [[媒体文件:Music_QA_wer.pdf]]

←上一版本		2014年3月14日 (五) 02:38的版本
第151行：		第151行：
	:* Addding song names and singer names improve performance in most cases		:* Addding song names and singer names improve performance in most cases
	:* There indeed some exceptions in the figure that (a) higher WER does not reduce QA necessarily (b) adding entity names does not improve QA		:* There indeed some exceptions in the figure that (a) higher WER does not reduce QA necessarily (b) adding entity names does not improve QA
		+	:* The results on [[Music_QA_wer.pdf]]

2014-03-14 - 版本历史

Cslt：/* Speech QA */

2014年3月14日 (五) 02:38 Cslt

Cslt：以内容“==Resoruce Building== * Current text resource has been re-arranged and listed == AM development == === Sparse DNN === * Optimal Brain Damage(OBD). # GA-based block...”创建新页面