“2013-08-10”版本间的差异

2013年8月20日 (二) 05:17的最后版本

Data sharing

LM count files still undelivered!

DNN progress

Discriminative DNN

Running 1200-3620 NN, graph generation is done. DT training should be done in 3 days.

Sparse DNN

Iterative sparse sticky training runs. More sparsity is expected.

Tencent exps

online support
garbage model training
VAD optimization

DNN Confidence estimation

Distribution graph is obtained. The performance seems bad.
A possible reason is that the decoding is LM-based, and the confidence is only acoustic related. So (1) the errors in linguistic layer are not really errors in the acoustic layer (2) the search will automatically choose the almost-correct phones/states.
The conclusion is that the DNN confidence is most suitable for grammar-based applications, or at least LM information is not very strong.

To be done:

CI phone confidence, on going
No-tone confidence, on going

GFCC DNN

GFCC computing is highly slow. 100 hour speech costs 16 hour cpu time. RT is around 0.2. It is intolerable.
GFCC-based DNN training for 100 hour speech data is done. Need to test the noise-robust performance in 2 days.

Stream decoding

the code is done. Simple testing is completed.
Problem 1: CMN initialization is not perfect. Need to train a better initial CMN model.
Problem 2: balance for posterior-based silence detection.

Subgraph integration

G.fst integration is done. Initial test passed. Looks like the zero-probability is better for the NUM class.
HCLG integration is done. A bug fixed, passed initial test.
Online integration cost is 1 minute. Need to optimize.
Need thorough testing with the Tencent test suite.
Need to tune the subgraph feeding probability.

Embedded progress

GFCC-based engine test. Just started.
Attain a performance curve: RT,memory size,package size Vs vocabulary size.
A new demo released for 4600 song names. download here

@@ 第7行： / 第7行： @@
 === Discriminative DNN ===
-* Running 1200-3620 NN, graph generation is done. Should be done in 3 days.
+* Running 1200-3620 NN, graph generation is done. DT training should be done in 3 days.
 === Sparse DNN ===
@@ 第22行： / 第22行： @@
 * Distribution graph is obtained. The performance seems bad.
 * A possible reason is that the decoding is LM-based, and the confidence is only acoustic related. So (1) the errors in linguistic layer are not really errors in the acoustic layer (2) the search will automatically choose the almost-correct phones/states.
-* The conclusion is that the DNN confidence is most suitable for grammar-based applications, or at least LM information is not very important.
+* The conclusion is that the DNN confidence is most suitable for grammar-based applications, or at least LM information is not very strong.
 * To be done:
 # CI phone confidence, on going
 # No-tone confidence, on going
 ==GFCC DNN ==
@@ 第36行： / 第35行： @@
 ==Stream decoding==
 * the code is done. Simple testing is completed.
-* Problem 1: CMN initialization is not prefect. Need to train a better initial CMN model.
+* Problem 1: CMN initialization is not perfect. Need to train a better initial CMN model.
 * Problem 2: balance for posterior-based silence detection.
@@ 第49行： / 第48行： @@
 == Embedded progress ==
-* GFCC-based engine test
+* GFCC-based engine test. Just started.
 * Attain a performance curve: RT,memory size,package size Vs vocabulary size.
-* A new demo released for 4600 song names.
+* A new demo released for [[4600 song names]]. [http://cslt.riit.tsinghua.edu.cn/csltdemo/public/release/easr/easr.v1.0.song.apk download here]

“2013-08-10”版本间的差异

2013年8月20日 (二) 05:17的最后版本

目录

Data sharing

DNN progress

Discriminative DNN

Sparse DNN

Tencent exps

DNN Confidence estimation

GFCC DNN

Stream decoding

Subgraph integration

Embedded progress

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具