“ASR:2015-07-06”版本间的差异
来自cslt Wiki
(→Text Processing) |
(→Speech Processing) |
||
(某位用户的一个中间修订版本未显示) | |||
第2行: | 第2行: | ||
=== AM development === | === AM development === | ||
− | ==== Environment ==== | + | ==== Environment ====* |
− | + | * the GPU of grid-14 does not work | |
==== RNN AM==== | ==== RNN AM==== | ||
+ | *hold | ||
*morpheme RNN --zhiyuan | *morpheme RNN --zhiyuan | ||
− | + | *train using large dataset--mengyuan | |
==== Mic-Array ==== | ==== Mic-Array ==== | ||
第14行: | 第15行: | ||
====Data selection unsupervised learning | ====Data selection unsupervised learning | ||
− | * | + | * acoustic feature based submodular using Pinan dataset --zhiyong |
− | + | ||
====RNN-DAE(Deep based Auto-Encode-RNN)==== | ====RNN-DAE(Deep based Auto-Encode-RNN)==== | ||
第39行: | 第40行: | ||
===language vector=== | ===language vector=== | ||
− | * | + | * train using language vector with the dataset of 1400h_CN + 100h_EN--mengyuan |
+ | * write a paper--zhiyuan | ||
+ | |||
+ | ===rectifier=== | ||
+ | * WER performs worse using auraro4 --zhiyuan | ||
+ | * train using other dataset | ||
+ | * rectifier RNN | ||
+ | |||
+ | ==audio embedding=== | ||
+ | * audio ebedding --Wei Xu | ||
==Text Processing== | ==Text Processing== | ||
第62行: | 第72行: | ||
:*Pre-processing java class. | :*Pre-processing java class. | ||
* Reproduce baseline. | * Reproduce baseline. | ||
+ | ====Seq to Seq(09-15)==== | ||
+ | :* Review papers | ||
+ | * Reproduce baseline. | ||
+ | |||
+ | ===Text Group Intern Project=== | ||
+ | :*====Buddhist Process==== | ||
+ | (hold) | ||
+ | ====RNN Poem Process==== | ||
+ | (hold) | ||
+ | ====RNN Document Vector==== | ||
+ | (hold) | ||
+ | ====Image Baseline==== | ||
+ | (hold) |
2015年7月8日 (三) 07:41的最后版本
Speech Processing
AM development
==== Environment ====*
- the GPU of grid-14 does not work
RNN AM
- hold
- morpheme RNN --zhiyuan
- train using large dataset--mengyuan
Mic-Array
- hold
- compute EER with kaldi
====Data selection unsupervised learning
- acoustic feature based submodular using Pinan dataset --zhiyong
RNN-DAE(Deep based Auto-Encode-RNN)
- hold
- deliver to mengyuan
Speaker ID
- DNN-based sid --Lantian
Ivector&Dvector based ASR
- hold --Tian Lan
- Cluster the speakers to speaker-classes, then using the distance or the posterior-probability as the metric
- dark-konowlege using i-vector
- train on wsj(testbase dev93+evl92)
- --hold
Dark knowledge
- test random last output layer when train MPE --zhiyuan
language vector
- train using language vector with the dataset of 1400h_CN + 100h_EN--mengyuan
- write a paper--zhiyuan
rectifier
- WER performs worse using auraro4 --zhiyuan
- train using other dataset
- rectifier RNN
audio embedding=
- audio ebedding --Wei Xu
Text Processing
RNN LM
- character-lm rnn(hold)
- lstm+rnn
- check the lstm-rnnlm code about how to Initialize and update learning rate.(hold)
Neural Based Document Classification
- (hold)
Order representation
- Nested Dropout
- modify the objective function(hold)
Balance Representation
- Find error signal
Recommendation
- Reproduce baseline.
DSSM based QA
- Pre-processing java class.
- Reproduce baseline.
Seq to Seq(09-15)
- Review papers
- Reproduce baseline.
Text Group Intern Project
- ====Buddhist Process====
(hold)
RNN Poem Process
(hold)
RNN Document Vector
(hold)
Image Baseline
(hold)