“ASR:2015-09-09”版本间的差异
来自cslt Wiki
(以“==Speech Processing == === AM development === ==== Environment ==== * grid-12 sometimes does not work ==== RNN AM==== *train monophone RNN --zhiyuan *train using...”为内容创建页面) |
(→Speech Processing) |
||
(2位用户的3个中间修订版本未显示) | |||
第3行: | 第3行: | ||
==== Environment ==== | ==== Environment ==== | ||
− | * grid-12 | + | * grid-12 GPU is transferred to grid-18 |
− | + | * buy a 970 GPU | |
==== RNN AM==== | ==== RNN AM==== | ||
*train monophone RNN --zhiyuan | *train monophone RNN --zhiyuan | ||
− | *train using large dataset--mengyuan | + | :* decode using 5-gram |
− | *write code to tune learning rate--zhiyong | + | :* the train method of batch |
+ | * train using large dataset--mengyuan | ||
+ | * write code to tune learning rate --zhiyong | ||
+ | :* has completed Nestrov/Adagrad/Adagrad-max | ||
+ | :* has unstable phenomenon | ||
+ | :* completed adam,adadeta,adam-max --Xiangyu,Zhiyong | ||
+ | :* reproduce PSO --Xiangyu | ||
==== Mic-Array ==== | ==== Mic-Array ==== | ||
第19行: | 第25行: | ||
* acoustic feature based submodular using Pingan dataset --zhiyong | * acoustic feature based submodular using Pingan dataset --zhiyong | ||
* write code to speed up --zhiyong | * write code to speed up --zhiyong | ||
− | |||
====RNN-DAE(Deep based Auto-Encode-RNN)==== | ====RNN-DAE(Deep based Auto-Encode-RNN)==== | ||
− | * | + | * RNN-DAE has worse performance than DNN-DAE because training dataset is small |
− | * | + | * extract real room impulse to generate WSJ reverberation data, and then train RNN-DAE |
− | + | ||
− | + | ||
===Ivector&Dvector based ASR=== | ===Ivector&Dvector based ASR=== | ||
− | * Cluster the speakers to speaker- | + | * Cluster the speakers to speaker-cluster |
− | :*hold | + | :* hold |
− | * dark | + | * dark knowledge |
+ | :* has much worse performance than baseline (EER: base 29% dark knowledge 48%) | ||
* RNN ivector | * RNN ivector | ||
− | * binary ivector | + | :* hold |
+ | * binary ivector done | ||
+ | * metric learning | ||
===language vector=== | ===language vector=== | ||
* hold | * hold | ||
* train using language vector with the dataset of 1400h_CN + 100h_EN--mengyuan | * train using language vector with the dataset of 1400h_CN + 100h_EN--mengyuan | ||
− | |||
* write a paper--zhiyuan | * write a paper--zhiyuan | ||
* RNN language vector | * RNN language vector | ||
:* hold | :* hold | ||
− | |||
===multi-GPU==== | ===multi-GPU==== | ||
* multi-stream training --Sheng Su | * multi-stream training --Sheng Su | ||
− | * solve the problem of buffer--Mengyuan | + | :* two GPUs work well, but four GPUs divergent |
+ | * solve the problem of buffer--Mengyuan, Sheng Su | ||
+ | |||
+ | ===Neutral picture style transfer=== | ||
+ | * reproduced the result of the paper "A neutral algorithm of artistic style" --Zhiyuan, Xuewei | ||
+ | * while subject to the GPU's memory, limited to inception net with sgd optimizer (VGG network with the default L-BFGS optimizer consumes very much memory, which is better) | ||
==Text Processing== | ==Text Processing== | ||
第124行: | 第133行: | ||
:* add more feature | :* add more feature | ||
==rule combination== | ==rule combination== | ||
− | * | + | * GA method to optimize the model |
+ | |||
==basic rule== | ==basic rule== | ||
* classical tenth model | * classical tenth model | ||
− | + | ==multiple-factor== | |
+ | * add more factor | ||
+ | * use sparse model | ||
==display== | ==display== | ||
* bug fixed | * bug fixed | ||
− | |||
:* buy rule fixed | :* buy rule fixed | ||
− | |||
==data== | ==data== | ||
* data api | * data api | ||
− | * download data | + | :* download the future data and factor data |
2015年9月14日 (一) 09:01的最后版本
目录
- 1 Speech Processing
- 2 Text Processing
- 3 financial group
Speech Processing
AM development
Environment
- grid-12 GPU is transferred to grid-18
- buy a 970 GPU
RNN AM
- train monophone RNN --zhiyuan
- decode using 5-gram
- the train method of batch
- train using large dataset--mengyuan
- write code to tune learning rate --zhiyong
- has completed Nestrov/Adagrad/Adagrad-max
- has unstable phenomenon
- completed adam,adadeta,adam-max --Xiangyu,Zhiyong
- reproduce PSO --Xiangyu
Mic-Array
- hold
- compute EER with kaldi
====Data selection unsupervised learning
- hold
- acoustic feature based submodular using Pingan dataset --zhiyong
- write code to speed up --zhiyong
RNN-DAE(Deep based Auto-Encode-RNN)
- RNN-DAE has worse performance than DNN-DAE because training dataset is small
- extract real room impulse to generate WSJ reverberation data, and then train RNN-DAE
Ivector&Dvector based ASR
- Cluster the speakers to speaker-cluster
- hold
- dark knowledge
- has much worse performance than baseline (EER: base 29% dark knowledge 48%)
- RNN ivector
- hold
- binary ivector done
- metric learning
language vector
- hold
- train using language vector with the dataset of 1400h_CN + 100h_EN--mengyuan
- write a paper--zhiyuan
- RNN language vector
- hold
multi-GPU=
- multi-stream training --Sheng Su
- two GPUs work well, but four GPUs divergent
- solve the problem of buffer--Mengyuan, Sheng Su
Neutral picture style transfer
- reproduced the result of the paper "A neutral algorithm of artistic style" --Zhiyuan, Xuewei
- while subject to the GPU's memory, limited to inception net with sgd optimizer (VGG network with the default L-BFGS optimizer consumes very much memory, which is better)
Text Processing
RNN LM
- character-lm rnn(hold)
- lstm+rnn
- check the lstm-rnnlm code about how to Initialize and update learning rate.(hold)
Neural Based Document Classification
- (hold)
RNN Rank Task
- Test.
- Paper: RNN Rank Net.
- (hold)
- Output rank information.
Graph RNN
- Entity path embeded to entity.
- (hold)
RNN Word Segment
- Set bound to word segment.
- (hold)
Seq to Seq(09-15)
- Review papers.
- Reproduce baseline. (08-03 <--> 08-17)
Order representation
- Nested Dropout
- semi-linear --> neural based auto-encoder.
- modify the objective function(hold)
Balance Representation
- Find error signal
Recommendation
- Reproduce baseline.
- LDA matrix dissovle.
- LDA (Text classification & Recommendation System) --> AAAI
RNN based QA
- Read Source Code.
- Attention based QA.
- Coding.
RNN Poem Process
- Seq based BP.
- (hold)
Text Group Intern Project
Buddhist Process
- (hold)
RNN Poem Process
- Done by Haichao yu & Chaoyuan zuo Mentor : Tianyi Luo.
RNN Document Vector
- (hold)
Image Baseline
- Demo Release.
- Paper Report.
- Read CNN Paper.
Text Intuitive Idea
Trace Learning
- (Hold)
Match RNN
- (Hold)
financial group
model research
- RNN
- online model, update everyday
- modify cost function and learning method
- add more feature
rule combination
- GA method to optimize the model
basic rule
- classical tenth model
multiple-factor
- add more factor
- use sparse model
display
- bug fixed
- buy rule fixed
data
- data api
- download the future data and factor data