“Task List”版本间的差异
来自cslt Wiki
第1行: | 第1行: | ||
=Task To Do= | =Task To Do= | ||
==Speech Recognition== | ==Speech Recognition== | ||
− | + | ===End-to-End speech recognition=== | |
− | + | * Discriminative-Learning code implementation | |
− | + | :* Zhiyuan Tang | |
− | + | *Zhiyuan Tang/Mengyuan Zhao/Zhiyong Zhang | |
− | + | ===Multi-task=== | |
− | + | * Fusion of speech-recognition and speech-rate | |
− | + | :* Xiangyu Zeng | |
− | + | * Self-informed neural network structure learning | |
− | + | :* Mengyuan Zhao | |
− | + | ===Integrate the class information to HCLG fst for speech recognition=== | |
− | + | ===Distant speech recognition=== | |
− | + | *RNN-DAE: echo or reverberation | |
− | + | :*Xuewei Zhang/Zhiyuan Tang/Mengyuan Zhao/Zhiyong Zhang | |
− | + | *Reverberation | |
− | + | :*Mutli-microphones | |
− | + | :*(Lasso),Xuewei Zhang | |
− | + | ===Voice conversation=== | |
− | + | ==Sparse DNN=== | |
− | + | *Zhiyuan Tang | |
− | + | ===Correlation based SENONE cluster=== | |
− | + | ===NN Multi-GPU parallel traing=== | |
− | + | *Multi-Machine | |
− | + | :*Sheng Su | |
− | + | *Multi-GPU on one Machine | |
− | + | :*Sheng Su | |
− | + | * nnet3 code test | |
− | + | ===Audio Embedding=== | |
− | + | *Ke Ning | |
− | + | ===RNN training accelerating=== | |
− | + | ===Data selection=== | |
− | + | *Zhiyong Zhang | |
− | + | *Sub-modular data selection | |
− | + | *Objective-function loss training self-adaptation | |
− | + | ==Decoder=== | |
− | + | *Confidence output for task-required | |
==Speaker Verification== | ==Speaker Verification== | ||
− | + | ===binary code=== | |
− | + | *Lantian Li | |
− | + | ===RNN-ivector=== | |
− | + | *Lantian Li | |
− | + | ===DNN clustering=== | |
− | + | *Lantian Li | |
=Task DONE= | =Task DONE= | ||
− | + | ===Multi-Mode features based VAD=== | |
− | + | * Shi Yin | |
− | + | ===DNN based Language identification and Speaker identification=== | |
− | + | * Xuewei Zhang/Zhiyuan Tang | |
− | + | ===Neural network visulization=== | |
− | + | * Mian Wang,DONE | |
− | + | ===Dark knowledge=== | |
− | + | * Mengyuan Zhao, Xiangyu Zeng, Zhiyong Zhang, Chao Liu | |
− | + | ===Normal RNN speech recognition=== | |
− | + | * Mengyuan Zhao | |
− | + | ===Monmentum-like Hessien-Free acceleration=== | |
− | + | * Nestrov/Adagrad/AdaDelta/AdaM | |
− | + | * Zhiyong Zhang/Xiangyu Zeng | |
− | + | ===Activation value normalization through time --Batch Normalization=== | |
− | + | * Zhiyong Zhang | |
− | + | ===Mix-training Balance decision tree=== | |
− | + | * Zhiyong Zhang | |
− | + | ===20-h Chinese data-set release=== | |
− | + | * Xuewei Zhang | |
− | + | ===Unbound activation function(Rectifier/Maxout/Pnorm) go-through searching method=== | |
− | + | * nne3 test --Xuewei Zhang | |
=Technical Report To Write= | =Technical Report To Write= |
2015年10月19日 (一) 11:59的版本
目录
- 1 Task To Do
- 2 Task DONE
- 2.1 Multi-Mode features based VAD
- 2.2 DNN based Language identification and Speaker identification
- 2.3 Neural network visulization
- 2.4 Dark knowledge
- 2.5 Normal RNN speech recognition
- 2.6 Monmentum-like Hessien-Free acceleration
- 2.7 Activation value normalization through time --Batch Normalization
- 2.8 Mix-training Balance decision tree
- 2.9 20-h Chinese data-set release
- 2.10 Unbound activation function(Rectifier/Maxout/Pnorm) go-through searching method
- 3 Technical Report To Write
- 4 Paper to Write
- 5 Project
Task To Do
Speech Recognition
End-to-End speech recognition
- Discriminative-Learning code implementation
- Zhiyuan Tang
- Zhiyuan Tang/Mengyuan Zhao/Zhiyong Zhang
Multi-task
- Fusion of speech-recognition and speech-rate
- Xiangyu Zeng
- Self-informed neural network structure learning
- Mengyuan Zhao
Integrate the class information to HCLG fst for speech recognition
Distant speech recognition
- RNN-DAE: echo or reverberation
- Xuewei Zhang/Zhiyuan Tang/Mengyuan Zhao/Zhiyong Zhang
- Reverberation
- Mutli-microphones
- (Lasso),Xuewei Zhang
Voice conversation
Sparse DNN=
- Zhiyuan Tang
Correlation based SENONE cluster
NN Multi-GPU parallel traing
- Multi-Machine
- Sheng Su
- Multi-GPU on one Machine
- Sheng Su
- nnet3 code test
Audio Embedding
- Ke Ning
RNN training accelerating
Data selection
- Zhiyong Zhang
- Sub-modular data selection
- Objective-function loss training self-adaptation
Decoder=
- Confidence output for task-required
Speaker Verification
binary code
- Lantian Li
RNN-ivector
- Lantian Li
DNN clustering
- Lantian Li
Task DONE
Multi-Mode features based VAD
- Shi Yin
DNN based Language identification and Speaker identification
- Xuewei Zhang/Zhiyuan Tang
Neural network visulization
- Mian Wang,DONE
Dark knowledge
- Mengyuan Zhao, Xiangyu Zeng, Zhiyong Zhang, Chao Liu
Normal RNN speech recognition
- Mengyuan Zhao
Monmentum-like Hessien-Free acceleration
- Nestrov/Adagrad/AdaDelta/AdaM
- Zhiyong Zhang/Xiangyu Zeng
Activation value normalization through time --Batch Normalization
- Zhiyong Zhang
Mix-training Balance decision tree
- Zhiyong Zhang
20-h Chinese data-set release
- Xuewei Zhang
Unbound activation function(Rectifier/Maxout/Pnorm) go-through searching method
- nne3 test --Xuewei Zhang
Technical Report To Write
1, DNN-DAE based noise cancellation -- Xiangyu Zeng / Mengyuan Zhao / Zhiyong Zhang --DONE 2, Speech Rate DNN speech recognition --Shi Yin/Xiangyu Zeng --DONE 3, CNN+fbank feature combination --Mian Wang /Yiye Lin /Mengyuan Zhao /Shi Yin 4, Uyghur low-resource acoustic model enhancement -- Shi Yin / Mengyuan Zhao / Zhiyong Zhang --DONE 5, Uyghur 20h database release --Kaer /Shi Yin --DONE 6,Dark-Knowledge Transfer *: Xiangyu Zeng/ Mengyuan Zhao / Zhiyong Zhang
Paper to Write
Project
- Xiaomi TV
- Mengyuan Zhao/Zhiyong Zhang
- TAG-lm & Domain-specific general lm
- Chinese-English mix-training