“ASR:2015-09-09”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
financial group
Zxw讨论 | 贡献
Speech Processing
 
(某位用户的一个中间修订版本未显示)
第4行: 第4行:
 
==== Environment ====
 
==== Environment ====
 
* grid-12 GPU is transferred to grid-18
 
* grid-12 GPU is transferred to grid-18
 
+
* buy a 970 GPU
  
 
==== RNN AM====
 
==== RNN AM====
第10行: 第10行:
 
:* decode using 5-gram
 
:* decode using 5-gram
 
:* the train method of batch  
 
:* the train method of batch  
*train using large dataset--mengyuan
+
* train using large dataset--mengyuan
:* MPE has NAN problem
+
* write code to tune learning rate --zhiyong
*write code to tune learning rate--zhiyong
+
 
:* has completed Nestrov/Adagrad/Adagrad-max
 
:* has completed Nestrov/Adagrad/Adagrad-max
 
:* has unstable phenomenon
 
:* has unstable phenomenon
 +
:* completed adam,adadeta,adam-max --Xiangyu,Zhiyong
 +
:* reproduce PSO --Xiangyu
  
 
==== Mic-Array ====
 
==== Mic-Array ====
第24行: 第25行:
 
* acoustic feature based submodular using Pingan dataset --zhiyong
 
* acoustic feature based submodular using Pingan dataset --zhiyong
 
* write code to speed up --zhiyong
 
* write code to speed up --zhiyong
 
  
 
====RNN-DAE(Deep based Auto-Encode-RNN)====
 
====RNN-DAE(Deep based Auto-Encode-RNN)====
第38行: 第38行:
 
:* hold
 
:* hold
 
* binary ivector done
 
* binary ivector done
 +
* metric learning
  
 
===language vector===
 
===language vector===
第45行: 第46行:
 
* RNN language vector
 
* RNN language vector
 
:* hold  
 
:* hold  
 
  
 
===multi-GPU====
 
===multi-GPU====
第133行: 第133行:
 
:* add more feature
 
:* add more feature
 
==rule combination==
 
==rule combination==
* rule analysis
+
* GA method to optimize the model
 +
 
 
==basic rule==
 
==basic rule==
 
* classical tenth model
 
* classical tenth model

2015年9月14日 (一) 09:01的最后版本

Speech Processing

AM development

Environment

  • grid-12 GPU is transferred to grid-18
  • buy a 970 GPU

RNN AM

  • train monophone RNN --zhiyuan
  • decode using 5-gram
  • the train method of batch
  • train using large dataset--mengyuan
  • write code to tune learning rate --zhiyong
  • has completed Nestrov/Adagrad/Adagrad-max
  • has unstable phenomenon
  • completed adam,adadeta,adam-max --Xiangyu,Zhiyong
  • reproduce PSO --Xiangyu

Mic-Array

  • hold
  • compute EER with kaldi

====Data selection unsupervised learning

  • hold
  • acoustic feature based submodular using Pingan dataset --zhiyong
  • write code to speed up --zhiyong

RNN-DAE(Deep based Auto-Encode-RNN)

  • RNN-DAE has worse performance than DNN-DAE because training dataset is small
  • extract real room impulse to generate WSJ reverberation data, and then train RNN-DAE

Ivector&Dvector based ASR

  • Cluster the speakers to speaker-cluster
  • hold
  • dark knowledge
  • has much worse performance than baseline (EER: base 29% dark knowledge 48%)
  • RNN ivector
  • hold
  • binary ivector done
  • metric learning

language vector

  • hold
  • train using language vector with the dataset of 1400h_CN + 100h_EN--mengyuan
  • write a paper--zhiyuan
  • RNN language vector
  • hold

multi-GPU=

  • multi-stream training --Sheng Su
  • two GPUs work well, but four GPUs divergent
  • solve the problem of buffer--Mengyuan, Sheng Su

Neutral picture style transfer

  • reproduced the result of the paper "A neutral algorithm of artistic style" --Zhiyuan, Xuewei
  • while subject to the GPU's memory, limited to inception net with sgd optimizer (VGG network with the default L-BFGS optimizer consumes very much memory, which is better)

Text Processing

RNN LM

  • character-lm rnn(hold)
  • lstm+rnn
  • check the lstm-rnnlm code about how to Initialize and update learning rate.(hold)

Neural Based Document Classification

  • (hold)

RNN Rank Task

  • Test.
  • Paper: RNN Rank Net.
  • (hold)
  • Output rank information.

Graph RNN

  • Entity path embeded to entity.
  • (hold)

RNN Word Segment

  • Set bound to word segment.
  • (hold)

Seq to Seq(09-15)

  • Review papers.
  • Reproduce baseline. (08-03 <--> 08-17)

Order representation

  • Nested Dropout
  • semi-linear --> neural based auto-encoder.
  • modify the objective function(hold)

Balance Representation

  • Find error signal

Recommendation

  • Reproduce baseline.
  • LDA matrix dissovle.
  • LDA (Text classification & Recommendation System) --> AAAI

RNN based QA

  • Read Source Code.
  • Attention based QA.
  • Coding.

RNN Poem Process

  • Seq based BP.
  • (hold)

Text Group Intern Project

Buddhist Process

  • (hold)

RNN Poem Process

  • Done by Haichao yu & Chaoyuan zuo Mentor : Tianyi Luo.

RNN Document Vector

  • (hold)

Image Baseline

  • Demo Release.
  • Paper Report.
  • Read CNN Paper.

Text Intuitive Idea

Trace Learning

  • (Hold)

Match RNN

  • (Hold)

financial group

model research

  • RNN
  • online model, update everyday
  • modify cost function and learning method
  • add more feature

rule combination

  • GA method to optimize the model

basic rule

  • classical tenth model

multiple-factor

  • add more factor
  • use sparse model

display

  • bug fixed
  • buy rule fixed

data

  • data api
  • download the future data and factor data