“2013-08-10”版本间的差异
来自cslt Wiki
(→Tencent exps) |
(→Embedded progress) |
||
(相同用户的5个中间修订版本未显示) | |||
第7行: | 第7行: | ||
=== Discriminative DNN === | === Discriminative DNN === | ||
− | * Running 1200-3620 NN, graph generation is done. | + | * Running 1200-3620 NN, graph generation is done. DT training should be done in 3 days. |
=== Sparse DNN === | === Sparse DNN === | ||
第22行: | 第22行: | ||
* Distribution graph is obtained. The performance seems bad. | * Distribution graph is obtained. The performance seems bad. | ||
* A possible reason is that the decoding is LM-based, and the confidence is only acoustic related. So (1) the errors in linguistic layer are not really errors in the acoustic layer (2) the search will automatically choose the almost-correct phones/states. | * A possible reason is that the decoding is LM-based, and the confidence is only acoustic related. So (1) the errors in linguistic layer are not really errors in the acoustic layer (2) the search will automatically choose the almost-correct phones/states. | ||
− | * The conclusion is that the DNN confidence is most suitable for grammar-based applications, or at least LM information is not very | + | * The conclusion is that the DNN confidence is most suitable for grammar-based applications, or at least LM information is not very strong. |
* To be done: | * To be done: | ||
# CI phone confidence, on going | # CI phone confidence, on going | ||
# No-tone confidence, on going | # No-tone confidence, on going | ||
− | |||
==GFCC DNN == | ==GFCC DNN == | ||
第36行: | 第35行: | ||
==Stream decoding== | ==Stream decoding== | ||
* the code is done. Simple testing is completed. | * the code is done. Simple testing is completed. | ||
− | * Problem 1: CMN initialization is not | + | * Problem 1: CMN initialization is not perfect. Need to train a better initial CMN model. |
* Problem 2: balance for posterior-based silence detection. | * Problem 2: balance for posterior-based silence detection. | ||
第49行: | 第48行: | ||
== Embedded progress == | == Embedded progress == | ||
− | * GFCC-based engine test | + | * GFCC-based engine test. Just started. |
* Attain a performance curve: RT,memory size,package size Vs vocabulary size. | * Attain a performance curve: RT,memory size,package size Vs vocabulary size. | ||
− | * A new demo released for 4600 song names. | + | * A new demo released for [[4600 song names]]. [http://cslt.riit.tsinghua.edu.cn/csltdemo/public/release/easr/easr.v1.0.song.apk download here] |
2013年8月20日 (二) 05:17的最后版本
目录
Data sharing
- LM count files still undelivered!
DNN progress
Discriminative DNN
- Running 1200-3620 NN, graph generation is done. DT training should be done in 3 days.
Sparse DNN
- Iterative sparse sticky training runs. More sparsity is expected.
Tencent exps
- online support
- garbage model training
- VAD optimization
DNN Confidence estimation
- Distribution graph is obtained. The performance seems bad.
- A possible reason is that the decoding is LM-based, and the confidence is only acoustic related. So (1) the errors in linguistic layer are not really errors in the acoustic layer (2) the search will automatically choose the almost-correct phones/states.
- The conclusion is that the DNN confidence is most suitable for grammar-based applications, or at least LM information is not very strong.
- To be done:
- CI phone confidence, on going
- No-tone confidence, on going
GFCC DNN
- GFCC computing is highly slow. 100 hour speech costs 16 hour cpu time. RT is around 0.2. It is intolerable.
- GFCC-based DNN training for 100 hour speech data is done. Need to test the noise-robust performance in 2 days.
Stream decoding
- the code is done. Simple testing is completed.
- Problem 1: CMN initialization is not perfect. Need to train a better initial CMN model.
- Problem 2: balance for posterior-based silence detection.
Subgraph integration
- G.fst integration is done. Initial test passed. Looks like the zero-probability is better for the NUM class.
- HCLG integration is done. A bug fixed, passed initial test.
- Online integration cost is 1 minute. Need to optimize.
- Need thorough testing with the Tencent test suite.
- Need to tune the subgraph feeding probability.
Embedded progress
- GFCC-based engine test. Just started.
- Attain a performance curve: RT,memory size,package size Vs vocabulary size.
- A new demo released for 4600 song names. download here