|
|
第1行: |
第1行: |
− | =Task To Do=
| + | [[previous]] |
− | ==Speech Recognition==
| + | |
− | | + | |
− | ===CTC expanded===
| + | |
− | *Voice activity detection
| + | |
− | :*LSTM+CTC
| + | |
− | :*TDNN+CTC
| + | |
− | ::* BLANK as silence, others as speech
| + | |
− | | + | |
− | *Keyword detection
| + | |
− | :* Character/Word-level, external key-word fst
| + | |
− | :* G-fst need to be signal word?
| + | |
− | | + | |
− | *Emotion recognition
| + | |
− | :* LSTM-CTC
| + | |
− | | + | |
− | ===Network architecture test===
| + | |
− | *chain model:
| + | |
− | :*tdnn + simple-lstm
| + | |
− | ::* only keep forget gate
| + | |
− |
| + | |
− | *ctc+mpe
| + | |
− | :* similar to chain training
| + | |
− | | + | |
− | ===Spiral Joint Training of SPEECH and SPEAKER===
| + | |
− | *ASR & SID parallel training and benefit mutual
| + | |
− | | + | |
− | ===Small data-set and Big model===
| + | |
− | *Investigate the efficiency of pre-training on small/big-model using dark-knowledge
| + | |
− | | + | |
− | ===Low-resource language improvement===
| + | |
− | *SID
| + | |
− | :* How to improve low-resource speaker
| + | |
− | | + | |
− | ===End-to-End speech recognition===
| + | |
− | * Discriminative-Learning code implementation
| + | |
− | :* Zhiyuan Tang
| + | |
− | | + | |
− | ===Multi-task===
| + | |
− | * Fusion of speech-recognition and speech-rate
| + | |
− | :* Xiangyu Zeng
| + | |
− | * Self-informed neural network structure learning
| + | |
− | :* Mengyuan Zhao
| + | |
− | | + | |
− | ===Integrate the class information to HCLG fst for speech recognition===
| + | |
− | *zhiyuan
| + | |
− | | + | |
− | ===Distant speech recognition===
| + | |
− | *RNN-DAE: echo or reverberation
| + | |
− | :*Xuewei Zhang/Zhiyuan Tang/Mengyuan Zhao/Zhiyong Zhang
| + | |
− | *Reverberation
| + | |
− | :*Mutli-microphones
| + | |
− | :*(Lasso),Xuewei Zhang
| + | |
− | | + | |
− | ===Voice conversation===
| + | |
− | *hold
| + | |
− | | + | |
− | ===Sparse DNN===
| + | |
− | *Zhiyuan Tang
| + | |
− | | + | |
− | ===Correlation based SENONE cluster===
| + | |
− | | + | |
− | ===NN Multi-GPU parallel traing===
| + | |
− | *Multi-GPU using data parallelization
| + | |
− | :*Sheng Su
| + | |
− | * nnet3 mpe
| + | |
− | :* Xuewei Zhang
| + | |
− | | + | |
− | ===Audio Embedding===
| + | |
− | *Ke Ning
| + | |
− | | + | |
− | ===RNN training accelerating===
| + | |
− | | + | |
− | ===Data selection===
| + | |
− | *Zhiyong Zhang
| + | |
− | *Sub-modular data selection
| + | |
− | *Objective-function loss training self-adaptation
| + | |
− | | + | |
− | ===Decoder===
| + | |
− | *Confidence output for task-required
| + | |
− | | + | |
− | ==Speaker Verification==
| + | |
− | ===binary code===
| + | |
− | *Lantian Li
| + | |
− | | + | |
− | ===RNN-ivector===
| + | |
− | *Lantian Li
| + | |
− | | + | |
− | ===DNN clustering===
| + | |
− | *Lantian Li
| + | |
− | | + | |
− | =Task DONE=
| + | |
− | ==Multi-Mode features based VAD==
| + | |
− | * Shi Yin
| + | |
− | | + | |
− | ==DNN based Language identification and Speaker identification==
| + | |
− | * Xuewei Zhang/Zhiyuan Tang
| + | |
− | | + | |
− | ==Neural network visulization==
| + | |
− | * Mian Wang,DONE
| + | |
− | | + | |
− | ==Dark knowledge==
| + | |
− | * Mengyuan Zhao, Xiangyu Zeng, Zhiyong Zhang, Chao Liu
| + | |
− | | + | |
− | ==Normal RNN speech recognition==
| + | |
− | * Mengyuan Zhao
| + | |
− | | + | |
− | ==Monmentum-like Hessien-Free acceleration==
| + | |
− | * Nestrov/Adagrad/AdaDelta/AdaM
| + | |
− | * Zhiyong Zhang/Xiangyu Zeng
| + | |
− | | + | |
− | ==Activation value normalization through time --Batch Normalization==
| + | |
− | * Zhiyong Zhang
| + | |
− | | + | |
− | ==Mix-training Balance decision tree==
| + | |
− | * Zhiyong Zhang
| + | |
− | | + | |
− | ==20-h Chinese data-set release==
| + | |
− | * Xuewei Zhang
| + | |
− | | + | |
− | ==Unbound activation function(Rectifier/Maxout/Pnorm) go-through searching method==
| + | |
− | * nne3 test --Xuewei Zhang
| + | |
− | | + | |
− | =Technical Report To Write=
| + | |
− | 1, DNN-DAE based noise cancellation -- Xiangyu Zeng / Mengyuan Zhao / Zhiyong Zhang --DONE
| + | |
− | 2, Speech Rate DNN speech recognition --Shi Yin/Xiangyu Zeng --DONE
| + | |
− | 3, CNN+fbank feature combination --Mian Wang /Yiye Lin /Mengyuan Zhao /Shi Yin
| + | |
− | 4, Uyghur low-resource acoustic model enhancement -- Shi Yin / Mengyuan Zhao / Zhiyong Zhang --DONE
| + | |
− | 5, Uyghur 20h database release --Kaer /Shi Yin --DONE
| + | |
− | 6,Dark-Knowledge Transfer
| + | |
− | *: Xiangyu Zeng/ Mengyuan Zhao / Zhiyong Zhang
| + | |
− | | + | |
− | =Paper to Write=
| + | |
− | | + | |
− | =Patent done=
| + | |
− | * A method of new word enhancement for speech recognition --Yue Zhang
| + | |
− | | + | |
− | =Project=
| + | |
− | * Xiaomi TV
| + | |
− | :*Mengyuan Zhao/Zhiyong Zhang
| + | |
− | :*TAG-lm & Domain-specific general lm
| + | |
− | *Chinese-English mix-training
| + | |