“ASR:2015-12-1”版本间的差异
来自cslt Wiki
(→Text Processing) |
(→Speech Processing) |
||
(4位用户的9个中间修订版本未显示) | |||
第1行: | 第1行: | ||
==Speech Processing == | ==Speech Processing == | ||
+ | |||
=== AM development === | === AM development === | ||
==== Environment ==== | ==== Environment ==== | ||
− | ==== | + | ==== End-to-End ==== |
− | * | + | *monophone ASR --Zhiyuan |
− | :* | + | :* MPE |
− | :* | + | :* CTC/nnet3/Kaldi |
− | + | ||
+ | ==== conditioning learning ==== | ||
+ | * language vector into multiple layers --Zhiyuan | ||
+ | :* a Chinese paper | ||
+ | * speech rate into multiple layers --Zhiyuan | ||
+ | :*verify the code for extra input(s) into DNN | ||
====Adapative learning rate method==== | ====Adapative learning rate method==== | ||
* sequence training -Xiangyu | * sequence training -Xiangyu | ||
:* write a technique report | :* write a technique report | ||
− | |||
==== Mic-Array ==== | ==== Mic-Array ==== | ||
第31行: | 第35行: | ||
* extract real room impulse to generate WSJ reverberation data, and then train RNN-DAE | * extract real room impulse to generate WSJ reverberation data, and then train RNN-DAE | ||
− | === | + | ===Speaker recognition=== |
− | * | + | * DNN-ivector framework |
− | + | * SUSR | |
− | + | * AutoEncoder + metric learning | |
* binary ivector | * binary ivector | ||
− | |||
− | |||
− | |||
===language vector=== | ===language vector=== | ||
第49行: | 第50行: | ||
* RNN language vector | * RNN language vector | ||
:*hold | :*hold | ||
− | + | ||
===multi-GPU=== | ===multi-GPU=== | ||
第59行: | 第60行: | ||
:*http://192.168.0.51:5555/cgi-bin/cvss/cvss_request.pl?account=zxw&step=view_request&cvssid=472 | :*http://192.168.0.51:5555/cgi-bin/cvss/cvss_request.pl?account=zxw&step=view_request&cvssid=472 | ||
* train 7*2048 tdnn using 4000h data --Mengyuan | * train 7*2048 tdnn using 4000h data --Mengyuan | ||
+ | * 1700h+776h 16k nnet3 6*2000 training done, outperform 6776H_mpe model--Mengyuan | ||
+ | * wrote nnet3 biglm-decoder for sinovoice. | ||
* train mpe using wsj and aurara4 --Zhiyong,Xuewei | * train mpe using wsj and aurara4 --Zhiyong,Xuewei | ||
+ | * train nnet3 mpe using data from Jietong--Xuewei | ||
===multi-task=== | ===multi-task=== | ||
第70行: | 第74行: | ||
:* no significant performance improvement observed | :* no significant performance improvement observed | ||
:*http://192.168.0.51:5555/cgi-bin/cvss/cvss_request.pl?account=zxw&step=view_request&cvssid=483 | :*http://192.168.0.51:5555/cgi-bin/cvss/cvss_request.pl?account=zxw&step=view_request&cvssid=483 | ||
− | : | + | : get results with extra input of speech rate info --Zhiyuan |
==Text Processing== | ==Text Processing== | ||
− | + | ===Work=== | |
− | ====RNN | + | ====RNN Poem Process==== |
− | * | + | * Combine addition rhyme. |
− | * | + | * Investigate new method. |
− | + | ====Document Represent==== | |
− | * | + | * Code done. Wait some experiments result. |
+ | ====Seq to Seq==== | ||
+ | * Work on some tasks. | ||
+ | ====Order representation ==== | ||
+ | * Code some idea. | ||
+ | ====Balance Representation==== | ||
+ | * Investigate some papers. | ||
+ | * Current solution : Use knowledge or large corpus's similar pair. | ||
+ | |||
+ | ===Hold=== | ||
====Neural Based Document Classification==== | ====Neural Based Document Classification==== | ||
− | |||
− | |||
====RNN Rank Task==== | ====RNN Rank Task==== | ||
− | |||
− | |||
− | |||
− | |||
− | |||
====Graph RNN==== | ====Graph RNN==== | ||
:* Entity path embeded to entity. | :* Entity path embeded to entity. | ||
*(hold) | *(hold) | ||
− | |||
====RNN Word Segment==== | ====RNN Word Segment==== | ||
:* Set bound to word segment. | :* Set bound to word segment. | ||
* (hold) | * (hold) | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
====Recommendation==== | ====Recommendation==== | ||
* Reproduce baseline. | * Reproduce baseline. | ||
:*LDA matrix dissovle. | :*LDA matrix dissovle. | ||
:* LDA (Text classification & Recommendation System) --> AAAI | :* LDA (Text classification & Recommendation System) --> AAAI | ||
− | |||
====RNN based QA==== | ====RNN based QA==== | ||
*Read Source Code. | *Read Source Code. | ||
*Attention based QA. | *Attention based QA. | ||
*Coding. | *Coding. | ||
− | |||
− | |||
− | |||
− | |||
===Text Group Intern Project=== | ===Text Group Intern Project=== |
2015年12月7日 (一) 07:21的最后版本
目录
Speech Processing
AM development
Environment
End-to-End
- monophone ASR --Zhiyuan
- MPE
- CTC/nnet3/Kaldi
conditioning learning
- language vector into multiple layers --Zhiyuan
- a Chinese paper
- speech rate into multiple layers --Zhiyuan
- verify the code for extra input(s) into DNN
Adapative learning rate method
- sequence training -Xiangyu
- write a technique report
Mic-Array
- hold
- compute EER with kaldi
Data selection unsupervised learning
- hold
- acoustic feature based submodular using Pingan dataset --zhiyong
- write code to speed up --zhiyong
- curriculum learning --zhiyong
RNN-DAE(Deep based Auto-Encode-RNN)
- hold
- RNN-DAE has worse performance than DNN-DAE because training dataset is small
- extract real room impulse to generate WSJ reverberation data, and then train RNN-DAE
Speaker recognition
- DNN-ivector framework
- SUSR
- AutoEncoder + metric learning
- binary ivector
language vector
- write a paper--zhiyuan
- hold
- language vector is added to multi hidden layers--zhiyuan
- write code done
- check code
- http://192.168.0.51:5555/cgi-bin/cvss/cvss_request.pl?account=zxw&step=view_request&cvssid=480
- RNN language vector
- hold
multi-GPU
- multi-stream training --Sheng Su
- write a technique report
- kaldi-nnet3 --Xuewei
- 7*2048 8k 1400h tdnn training Xent done
- nnet3 mpe code is under investigation
- http://192.168.0.51:5555/cgi-bin/cvss/cvss_request.pl?account=zxw&step=view_request&cvssid=472
- train 7*2048 tdnn using 4000h data --Mengyuan
- 1700h+776h 16k nnet3 6*2000 training done, outperform 6776H_mpe model--Mengyuan
- wrote nnet3 biglm-decoder for sinovoice.
- train mpe using wsj and aurara4 --Zhiyong,Xuewei
- train nnet3 mpe using data from Jietong--Xuewei
multi-task
- test according to selt-information neural structure learning --mengyuan
- hold
- write code done
- no significant performance improvement observed
- speech rate learning --xiangyu
- hold
- no significant performance improvement observed
- http://192.168.0.51:5555/cgi-bin/cvss/cvss_request.pl?account=zxw&step=view_request&cvssid=483
- get results with extra input of speech rate info --Zhiyuan
Text Processing
Work
RNN Poem Process
- Combine addition rhyme.
- Investigate new method.
Document Represent
- Code done. Wait some experiments result.
Seq to Seq
- Work on some tasks.
Order representation
- Code some idea.
Balance Representation
- Investigate some papers.
- Current solution : Use knowledge or large corpus's similar pair.
Hold
Neural Based Document Classification
RNN Rank Task
Graph RNN
- Entity path embeded to entity.
- (hold)
RNN Word Segment
- Set bound to word segment.
- (hold)
Recommendation
- Reproduce baseline.
- LDA matrix dissovle.
- LDA (Text classification & Recommendation System) --> AAAI
RNN based QA
- Read Source Code.
- Attention based QA.
- Coding.
Text Group Intern Project
Buddhist Process
- (hold)
RNN Poem Process
- Done by Haichao yu & Chaoyuan zuo Mentor : Tianyi Luo.
RNN Document Vector
- (hold)
Image Baseline
- Demo Release.
- Paper Report.
- Read CNN Paper.
Text Intuitive Idea
Trace Learning
- (Hold)
Match RNN
- (Hold)
financial group
model research
- RNN
- online model, update everyday
- modify cost function and learning method
- add more feature
rule combination
- GA method to optimize the model
basic rule
- classical tenth model
multiple-factor
- add more factor
- use sparse model
display
- bug fixed
- buy rule fixed
data
- data api
- download the future data and factor data