“2013-08-10”版本间的差异

2013年8月20日 (二) 05:17的最后版本

Running 1200-3620 NN, graph generation is done. DT training should be done in 3 days.

Distribution graph is obtained. The performance seems bad.
A possible reason is that the decoding is LM-based, and the confidence is only acoustic related. So (1) the errors in linguistic layer are not really errors in the acoustic layer (2) the search will automatically choose the almost-correct phones/states.
The conclusion is that the DNN confidence is most suitable for grammar-based applications, or at least LM information is not very strong.

GFCC computing is highly slow. 100 hour speech costs 16 hour cpu time. RT is around 0.2. It is intolerable.
GFCC-based DNN training for 100 hour speech data is done. Need to test the noise-robust performance in 2 days.

the code is done. Simple testing is completed.
Problem 1: CMN initialization is not perfect. Need to train a better initial CMN model.
Problem 2: balance for posterior-based silence detection.

G.fst integration is done. Initial test passed. Looks like the zero-probability is better for the NUM class.
HCLG integration is done. A bug fixed, passed initial test.
Online integration cost is 1 minute. Need to optimize.
Need thorough testing with the Tencent test suite.
Need to tune the subgraph feeding probability.

@@ 第22行： / 第22行： @@
 * Distribution graph is obtained. The performance seems bad.
 * A possible reason is that the decoding is LM-based, and the confidence is only acoustic related. So (1) the errors in linguistic layer are not really errors in the acoustic layer (2) the search will automatically choose the almost-correct phones/states.
-* The conclusion is that the DNN confidence is most suitable for grammar-based applications, or at least LM information is not very important.
+* The conclusion is that the DNN confidence is most suitable for grammar-based applications, or at least LM information is not very strong.
 * To be done:
 # CI phone confidence, on going
 # No-tone confidence, on going
 ==GFCC DNN ==
@@ 第36行： / 第35行： @@
 ==Stream decoding==
 * the code is done. Simple testing is completed.
-* Problem 1: CMN initialization is not prefect. Need to train a better initial CMN model.
+* Problem 1: CMN initialization is not perfect. Need to train a better initial CMN model.
 * Problem 2: balance for posterior-based silence detection.
@@ 第49行： / 第48行： @@
 == Embedded progress ==
-* GFCC-based engine test
+* GFCC-based engine test. Just started.
 * Attain a performance curve: RT,memory size,package size Vs vocabulary size.
-* A new demo released for 4600 song names.
+* A new demo released for [[4600 song names]]. [http://cslt.riit.tsinghua.edu.cn/csltdemo/public/release/easr/easr.v1.0.song.apk download here]