“ASR:2015-12-1”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
Text Processing
Zxw讨论 | 贡献
Speech Processing
 
(4位用户的9个中间修订版本未显示)
第1行: 第1行:
 
==Speech Processing ==
 
==Speech Processing ==
 +
 
=== AM development ===
 
=== AM development ===
  
 
==== Environment ====
 
==== Environment ====
  
==== RNN AM====
+
==== End-to-End ====
*train monophone RNN --zhiyuan
+
*monophone ASR --Zhiyuan
:* end to end MPE
+
:* MPE
:* end to end using nnet3
+
:* CTC/nnet3/Kaldi
:* http://192.168.0.51:5555/cgi-bin/cvss/cvss_request.pl?account=zxw&step=view_request&cvssid=446
+
  
 +
==== conditioning learning ====
 +
* language vector into multiple layers --Zhiyuan
 +
:* a Chinese paper
 +
* speech rate into multiple layers --Zhiyuan
 +
:*verify the code for extra input(s) into DNN
  
 
====Adapative learning rate method====
 
====Adapative learning rate method====
 
* sequence training -Xiangyu
 
* sequence training -Xiangyu
 
:* write a technique report  
 
:* write a technique report  
:*http://192.168.0.51:5555/cgi-bin/cvss/cvss_request.pl?account=zxw&step=view_request&cvssid=458
 
  
 
==== Mic-Array ====
 
==== Mic-Array ====
第31行: 第35行:
 
* extract real room impulse to generate WSJ reverberation data, and then train RNN-DAE   
 
* extract real room impulse to generate WSJ reverberation data, and then train RNN-DAE   
  
===Ivector&Dvector based ASR===  
+
===Speaker recognition===  
* learning from ivector --Lantian
+
* DNN-ivector framework
:* CNN ivector learning
+
* SUSR
:* DNN ivector learning
+
* AutoEncoder + metric learning
 
* binary ivector  
 
* binary ivector  
* metric learning
 
* LDA-vector Transfer Learning
 
 
  
 
===language vector===
 
===language vector===
第49行: 第50行:
 
* RNN language vector
 
* RNN language vector
 
:*hold
 
:*hold
* train with extra input of speech rate info
+
 
  
 
===multi-GPU===
 
===multi-GPU===
第59行: 第60行:
 
:*http://192.168.0.51:5555/cgi-bin/cvss/cvss_request.pl?account=zxw&step=view_request&cvssid=472
 
:*http://192.168.0.51:5555/cgi-bin/cvss/cvss_request.pl?account=zxw&step=view_request&cvssid=472
 
* train 7*2048 tdnn using 4000h data --Mengyuan  
 
* train 7*2048 tdnn using 4000h data --Mengyuan  
 +
* 1700h+776h 16k nnet3 6*2000 training done, outperform 6776H_mpe model--Mengyuan
 +
* wrote nnet3 biglm-decoder for sinovoice.
 
* train mpe using wsj and aurara4 --Zhiyong,Xuewei
 
* train mpe using wsj and aurara4 --Zhiyong,Xuewei
 +
* train nnet3 mpe using data from Jietong--Xuewei
 
   
 
   
 
===multi-task===
 
===multi-task===
第70行: 第74行:
 
:* no significant performance improvement observed
 
:* no significant performance improvement observed
 
:*http://192.168.0.51:5555/cgi-bin/cvss/cvss_request.pl?account=zxw&step=view_request&cvssid=483
 
:*http://192.168.0.51:5555/cgi-bin/cvss/cvss_request.pl?account=zxw&step=view_request&cvssid=483
: test using extreme data
+
: get results with extra input of speech rate info --Zhiyuan
  
 
==Text Processing==
 
==Text Processing==
/*
+
===Work===
====RNN LM====
+
====RNN Poem Process====
*character-lm rnn(hold)
+
* Combine addition rhyme.
*lstm+rnn
+
* Investigate new method.
:* check the lstm-rnnlm code about how to Initialize and update learning rate.(hold)
+
====Document Represent====
*/
+
* Code done. Wait some experiments result.
 +
====Seq to Seq====
 +
* Work on some tasks.
 +
====Order representation ====
 +
* Code some idea.
 +
====Balance Representation====
 +
* Investigate some papers.
 +
* Current solution : Use knowledge or large corpus's similar pair.
 +
 
 +
===Hold===
 
====Neural Based Document Classification====
 
====Neural Based Document Classification====
* (hold)
 
 
 
====RNN Rank Task====
 
====RNN Rank Task====
:* Test.
 
:*Paper: RNN Rank Net.
 
* (hold)
 
* Output rank information.
 
 
 
====Graph RNN====
 
====Graph RNN====
 
:* Entity path embeded to entity.
 
:* Entity path embeded to entity.
 
*(hold)
 
*(hold)
 
 
====RNN Word Segment====
 
====RNN Word Segment====
 
:* Set bound to word segment.  
 
:* Set bound to word segment.  
 
* (hold)
 
* (hold)
 
====Seq to Seq(09-15)====
 
* Review papers.
 
* Reproduce baseline. (08-03 <--> 08-17)
 
 
====Order representation ====
 
* Nested Dropout
 
:*semi-linear --> neural based auto-encoder.
 
* modify the objective function(hold)
 
 
====Balance Representation====
 
* Find error signal
 
 
 
====Recommendation====
 
====Recommendation====
 
* Reproduce baseline.
 
* Reproduce baseline.
 
:*LDA matrix dissovle.
 
:*LDA matrix dissovle.
 
:* LDA (Text classification & Recommendation System) --> AAAI
 
:* LDA (Text classification & Recommendation System) --> AAAI
 
 
====RNN based QA====
 
====RNN based QA====
 
*Read Source Code.
 
*Read Source Code.
 
*Attention based QA.
 
*Attention based QA.
 
*Coding.
 
*Coding.
 
====RNN Poem Process====
 
:*Seq based BP.
 
*(hold)
 
  
 
===Text Group Intern Project===
 
===Text Group Intern Project===

2015年12月7日 (一) 07:21的最后版本

Speech Processing

AM development

Environment

End-to-End

  • monophone ASR --Zhiyuan
  • MPE
  • CTC/nnet3/Kaldi

conditioning learning

  • language vector into multiple layers --Zhiyuan
  • a Chinese paper
  • speech rate into multiple layers --Zhiyuan
  • verify the code for extra input(s) into DNN

Adapative learning rate method

  • sequence training -Xiangyu
  • write a technique report

Mic-Array

  • hold
  • compute EER with kaldi

Data selection unsupervised learning

  • hold
  • acoustic feature based submodular using Pingan dataset --zhiyong
  • write code to speed up --zhiyong
  • curriculum learning --zhiyong

RNN-DAE(Deep based Auto-Encode-RNN)

  • hold
  • RNN-DAE has worse performance than DNN-DAE because training dataset is small
  • extract real room impulse to generate WSJ reverberation data, and then train RNN-DAE

Speaker recognition

  • DNN-ivector framework
  • SUSR
  • AutoEncoder + metric learning
  • binary ivector

language vector

  • write a paper--zhiyuan
  • hold
  • language vector is added to multi hidden layers--zhiyuan
  • RNN language vector
  • hold


multi-GPU

  • multi-stream training --Sheng Su
  • write a technique report
  • kaldi-nnet3 --Xuewei
  • train 7*2048 tdnn using 4000h data --Mengyuan
  • 1700h+776h 16k nnet3 6*2000 training done, outperform 6776H_mpe model--Mengyuan
  • wrote nnet3 biglm-decoder for sinovoice.
  • train mpe using wsj and aurara4 --Zhiyong,Xuewei
  • train nnet3 mpe using data from Jietong--Xuewei

multi-task

  • test according to selt-information neural structure learning --mengyuan
  • hold
  • write code done
  • no significant performance improvement observed
  • speech rate learning --xiangyu
get results with extra input of speech rate info --Zhiyuan

Text Processing

Work

RNN Poem Process

  • Combine addition rhyme.
  • Investigate new method.

Document Represent

  • Code done. Wait some experiments result.

Seq to Seq

  • Work on some tasks.

Order representation

  • Code some idea.

Balance Representation

  • Investigate some papers.
  • Current solution : Use knowledge or large corpus's similar pair.

Hold

Neural Based Document Classification

RNN Rank Task

Graph RNN

  • Entity path embeded to entity.
  • (hold)

RNN Word Segment

  • Set bound to word segment.
  • (hold)

Recommendation

  • Reproduce baseline.
  • LDA matrix dissovle.
  • LDA (Text classification & Recommendation System) --> AAAI

RNN based QA

  • Read Source Code.
  • Attention based QA.
  • Coding.

Text Group Intern Project

Buddhist Process

  • (hold)

RNN Poem Process

  • Done by Haichao yu & Chaoyuan zuo Mentor : Tianyi Luo.

RNN Document Vector

  • (hold)

Image Baseline

  • Demo Release.
  • Paper Report.
  • Read CNN Paper.

Text Intuitive Idea

Trace Learning

  • (Hold)

Match RNN

  • (Hold)

financial group

model research

  • RNN
  • online model, update everyday
  • modify cost function and learning method
  • add more feature

rule combination

  • GA method to optimize the model

basic rule

  • classical tenth model

multiple-factor

  • add more factor
  • use sparse model

display

  • bug fixed
  • buy rule fixed

data

  • data api
  • download the future data and factor data