“2013-08-10”版本间的差异
来自cslt Wiki
(→Stream decoding) |
(→DNN Confidence estimation) |
||
第22行: | 第22行: | ||
* Distribution graph is obtained. The performance seems bad. | * Distribution graph is obtained. The performance seems bad. | ||
* A possible reason is that the decoding is LM-based, and the confidence is only acoustic related. So (1) the errors in linguistic layer are not really errors in the acoustic layer (2) the search will automatically choose the almost-correct phones/states. | * A possible reason is that the decoding is LM-based, and the confidence is only acoustic related. So (1) the errors in linguistic layer are not really errors in the acoustic layer (2) the search will automatically choose the almost-correct phones/states. | ||
− | * The conclusion is that the DNN confidence is most suitable for grammar-based applications, or at least LM information is not very | + | * The conclusion is that the DNN confidence is most suitable for grammar-based applications, or at least LM information is not very strong. |
* To be done: | * To be done: | ||
# CI phone confidence, on going | # CI phone confidence, on going | ||
# No-tone confidence, on going | # No-tone confidence, on going | ||
− | |||
==GFCC DNN == | ==GFCC DNN == |
2013年8月20日 (二) 03:40的版本
目录
Data sharing
- LM count files still undelivered!
DNN progress
Discriminative DNN
- Running 1200-3620 NN, graph generation is done. DT training should be done in 3 days.
Sparse DNN
- Iterative sparse sticky training runs. More sparsity is expected.
Tencent exps
- online support
- garbage model training
- VAD optimization
DNN Confidence estimation
- Distribution graph is obtained. The performance seems bad.
- A possible reason is that the decoding is LM-based, and the confidence is only acoustic related. So (1) the errors in linguistic layer are not really errors in the acoustic layer (2) the search will automatically choose the almost-correct phones/states.
- The conclusion is that the DNN confidence is most suitable for grammar-based applications, or at least LM information is not very strong.
- To be done:
- CI phone confidence, on going
- No-tone confidence, on going
GFCC DNN
- GFCC computing is highly slow. 100 hour speech costs 16 hour cpu time. RT is around 0.2. It is intolerable.
- GFCC-based DNN training for 100 hour speech data is done. Need to test the noise-robust performance in 2 days.
Stream decoding
- the code is done. Simple testing is completed.
- Problem 1: CMN initialization is not perfect. Need to train a better initial CMN model.
- Problem 2: balance for posterior-based silence detection.
Subgraph integration
- G.fst integration is done. Initial test passed. Looks like the zero-probability is better for the NUM class.
- HCLG integration is done. A bug fixed, passed initial test.
- Online integration cost is 1 minute. Need to optimize.
- Need thorough testing with the Tencent test suite.
- Need to tune the subgraph feeding probability.
Embedded progress
- GFCC-based engine test
- Attain a performance curve: RT,memory size,package size Vs vocabulary size.
- A new demo released for 4600 song names.