“Task previous”版本间的差异
来自cslt Wiki
(以“ =Task To Do= ==Speech Recognition== ===CTC expanded=== *Voice activity detection :*LSTM+CTC :*TDNN+CTC ::* BLANK as silence, others as speech *Keyword detection :...”为内容创建页面) |
(没有差异)
|
2016年10月16日 (日) 12:17的最后版本
目录
- 1 Task To Do
- 1.1 Speech Recognition
- 1.1.1 CTC expanded
- 1.1.2 Network architecture test
- 1.1.3 Spiral Joint Training of SPEECH and SPEAKER
- 1.1.4 Small data-set and Big model
- 1.1.5 Low-resource language improvement
- 1.1.6 End-to-End speech recognition
- 1.1.7 Multi-task
- 1.1.8 Integrate the class information to HCLG fst for speech recognition
- 1.1.9 Distant speech recognition
- 1.1.10 Voice conversation
- 1.1.11 Sparse DNN
- 1.1.12 Correlation based SENONE cluster
- 1.1.13 NN Multi-GPU parallel traing
- 1.1.14 Audio Embedding
- 1.1.15 RNN training accelerating
- 1.1.16 Data selection
- 1.1.17 Decoder
- 1.2 Speaker Verification
- 1.1 Speech Recognition
- 2 Task DONE
- 2.1 Multi-Mode features based VAD
- 2.2 DNN based Language identification and Speaker identification
- 2.3 Neural network visulization
- 2.4 Dark knowledge
- 2.5 Normal RNN speech recognition
- 2.6 Monmentum-like Hessien-Free acceleration
- 2.7 Activation value normalization through time --Batch Normalization
- 2.8 Mix-training Balance decision tree
- 2.9 20-h Chinese data-set release
- 2.10 Unbound activation function(Rectifier/Maxout/Pnorm) go-through searching method
- 3 Technical Report To Write
- 4 Paper to Write
- 5 Patent done
- 6 Project
Task To Do
Speech Recognition
CTC expanded
- Voice activity detection
- LSTM+CTC
- TDNN+CTC
- BLANK as silence, others as speech
- Keyword detection
- Character/Word-level, external key-word fst
- G-fst need to be signal word?
- Emotion recognition
- LSTM-CTC
Network architecture test
- chain model:
- tdnn + simple-lstm
- only keep forget gate
- ctc+mpe
- similar to chain training
Spiral Joint Training of SPEECH and SPEAKER
- ASR & SID parallel training and benefit mutual
Small data-set and Big model
- Investigate the efficiency of pre-training on small/big-model using dark-knowledge
Low-resource language improvement
- SID
- How to improve low-resource speaker
End-to-End speech recognition
- Discriminative-Learning code implementation
- Zhiyuan Tang
Multi-task
- Fusion of speech-recognition and speech-rate
- Xiangyu Zeng
- Self-informed neural network structure learning
- Mengyuan Zhao
Integrate the class information to HCLG fst for speech recognition
- zhiyuan
Distant speech recognition
- RNN-DAE: echo or reverberation
- Xuewei Zhang/Zhiyuan Tang/Mengyuan Zhao/Zhiyong Zhang
- Reverberation
- Mutli-microphones
- (Lasso),Xuewei Zhang
Voice conversation
- hold
Sparse DNN
- Zhiyuan Tang
Correlation based SENONE cluster
NN Multi-GPU parallel traing
- Multi-GPU using data parallelization
- Sheng Su
- nnet3 mpe
- Xuewei Zhang
Audio Embedding
- Ke Ning
RNN training accelerating
Data selection
- Zhiyong Zhang
- Sub-modular data selection
- Objective-function loss training self-adaptation
Decoder
- Confidence output for task-required
Speaker Verification
binary code
- Lantian Li
RNN-ivector
- Lantian Li
DNN clustering
- Lantian Li
Task DONE
Multi-Mode features based VAD
- Shi Yin
DNN based Language identification and Speaker identification
- Xuewei Zhang/Zhiyuan Tang
Neural network visulization
- Mian Wang,DONE
Dark knowledge
- Mengyuan Zhao, Xiangyu Zeng, Zhiyong Zhang, Chao Liu
Normal RNN speech recognition
- Mengyuan Zhao
Monmentum-like Hessien-Free acceleration
- Nestrov/Adagrad/AdaDelta/AdaM
- Zhiyong Zhang/Xiangyu Zeng
Activation value normalization through time --Batch Normalization
- Zhiyong Zhang
Mix-training Balance decision tree
- Zhiyong Zhang
20-h Chinese data-set release
- Xuewei Zhang
Unbound activation function(Rectifier/Maxout/Pnorm) go-through searching method
- nne3 test --Xuewei Zhang
Technical Report To Write
1, DNN-DAE based noise cancellation -- Xiangyu Zeng / Mengyuan Zhao / Zhiyong Zhang --DONE 2, Speech Rate DNN speech recognition --Shi Yin/Xiangyu Zeng --DONE 3, CNN+fbank feature combination --Mian Wang /Yiye Lin /Mengyuan Zhao /Shi Yin 4, Uyghur low-resource acoustic model enhancement -- Shi Yin / Mengyuan Zhao / Zhiyong Zhang --DONE 5, Uyghur 20h database release --Kaer /Shi Yin --DONE 6,Dark-Knowledge Transfer *: Xiangyu Zeng/ Mengyuan Zhao / Zhiyong Zhang
Paper to Write
Patent done
- A method of new word enhancement for speech recognition --Yue Zhang
Project
- Xiaomi TV
- Mengyuan Zhao/Zhiyong Zhang
- TAG-lm & Domain-specific general lm
- Chinese-English mix-training