“2014-10-13”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
Text Processing
第6行: 第6行:
 
* Experiments show that  
 
* Experiments show that  
 
* Suggest to use TIMIT / AURORA 4 for training
 
* Suggest to use TIMIT / AURORA 4 for training
 +
 +
==== RNN AM====
 +
* Initial test on WSJ , leads to out-memory.
 +
* Using AURORA 4 short-sentence with a smaller number of targets.
  
 
====Noise training====
 
====Noise training====
 
*  
 
*  
 
:* First draft of the noisy training journal paper  
 
:* First draft of the noisy training journal paper  
:* Check abnormal behavior with large sigma (Yinshi, Liuchao)
+
:* Check abnormal behavior with large sigma (Yinshi, Liuchao, Lin Yiye), be going.
  
 
====Drop out & Rectification & convolutive network====
 
====Drop out & Rectification & convolutive network====
  
 
* Drop out
 
* Drop out
 +
:* dataset:wsj, testset:eval93
 +
        std |  dropout0.2 | dropout0.4 | dropout0.6 | dropout0.8
 +
    -------------------------------------------------------------
 +
      4.5  |    4.54    |    4.5    |  4.25    |    4.5     
  
:* No performance improvement found yet.
+
:* Test on noisy AURORA 4 dataset
:* [http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?account=wangd&step=view_request&cvssid=261]
+
:* Continue the droptout on normal trained XEnt NNET , eg wsj.  
 +
:* Draft the dropout-DNN weight distribution.
  
 
* Rectification
 
* Rectification
:* Dropout NA problem was caused by large magnitude of weights
+
:* Still NAN error, need to debug.
 +
 
 +
* MaxOut
  
 
* Convolutive network
 
* Convolutive network
 
# Test more configurations  
 
# Test more configurations  
* Zhiyong will work on CNN
+
* Yiye will work on CNN
 
+
* Recurrent neural network
+
:* investigate CURRENNT for AM
+
 
+
 
+
  
 
====Denoising & Farfield ASR====
 
====Denoising & Farfield ASR====
  
*  
+
* ICASSP paper submitted.
* Lasso-based de-reverberation is done with the REVERBERATION toolkit
+
:* Start to compose the experiment section for the SL paper.
+
  
 
====VAD====
 
====VAD====
  
* problems found at the beginning part of speech (0-0.02s?)
+
* Add more silence tag "#" in pure-silence utterance text(train).
* Noise model training done. Under testing.
+
:* xEntropy model be training
* Need to investigate the performance reduction in babble noise. Call Jia.
+
  
 +
* Sum all sil-pdf as the silence posterior probability.
  
 
====Speech rate training====
 
====Speech rate training====
 
*  
 
*  
* Some interesting results with the simple speech rate change algorithm was obtained on the WSJ db
 
[http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?account=wangd&step=view_request&cvssid=268]
 
 
 
* Seems ROS model is superior to the normal one with faster speech
 
* Seems ROS model is superior to the normal one with faster speech
 
* Need to check distribution of ROS on WSJ
 
* Need to check distribution of ROS on WSJ
第56行: 第57行:
  
 
==== low resource language AM training ====
 
==== low resource language AM training ====
* Results on CVSS[http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?step=view_request&cvssid=274]
 
 
* Use Chinese NN as initial NN, change the last layer
 
* Use Chinese NN as initial NN, change the last layer
 +
:* Various the used Chinese trained DNN layer numbers. 
  
 
====Scoring====
 
====Scoring====
 
 
* global scoring done.
 
* global scoring done.
 
* Pitch & rhythm done, need testing
 
* Pitch & rhythm done, need testing
第67行: 第67行:
  
 
====Confidence====
 
====Confidence====
 +
* Reproduce the experiments on fisher dataset.
 +
* Use the fisher DNN model to decode all-wsj dataset
  
* experiments done, need more data
 
*
 
* Basic confidence by using lattice-based posterior + DNN posterior + ROS done
 
* 23% detection error achieved by balanced model
 
  
 
===Speaker ID===
 
===Speaker ID===
 +
* Preparing GMM-based server.
  
* Add VAD to system
+
===Emotion detection===
* GMM-based test program delivered
+
* GMM registration program done
+
  
===Emotion detection===
 
* Zhang Weiwei is learning the code
 
 
* Sinovoice is implementing the server
 
* Sinovoice is implementing the server
  

2014年10月13日 (一) 08:41的版本

Speech Processing

AM development

Sparse DNN

  • Performance improvement found when pruned slightly
  • Experiments show that
  • Suggest to use TIMIT / AURORA 4 for training

RNN AM

  • Initial test on WSJ , leads to out-memory.
  • Using AURORA 4 short-sentence with a smaller number of targets.

Noise training

  • First draft of the noisy training journal paper
  • Check abnormal behavior with large sigma (Yinshi, Liuchao, Lin Yiye), be going.

Drop out & Rectification & convolutive network

  • Drop out
  • dataset:wsj, testset:eval93
       std |  dropout0.2 | dropout0.4 | dropout0.6 | dropout0.8
    ------------------------------------------------------------- 
      4.5  |     4.54    |    4.5     |   4.25     |    4.5      
  • Test on noisy AURORA 4 dataset
  • Continue the droptout on normal trained XEnt NNET , eg wsj.
  • Draft the dropout-DNN weight distribution.
  • Rectification
  • Still NAN error, need to debug.
  • MaxOut
  • Convolutive network
  1. Test more configurations
  • Yiye will work on CNN

Denoising & Farfield ASR

  • ICASSP paper submitted.

VAD

  • Add more silence tag "#" in pure-silence utterance text(train).
  • xEntropy model be training
  • Sum all sil-pdf as the silence posterior probability.

Speech rate training

  • Seems ROS model is superior to the normal one with faster speech
  • Need to check distribution of ROS on WSJ
  • Suggest to extract speech data of different ROS, construct a new test set
  • Suggest to use Tencent training data
  • Suggest to remove silence when compute ROS

low resource language AM training

  • Use Chinese NN as initial NN, change the last layer
  • Various the used Chinese trained DNN layer numbers.

Scoring

  • global scoring done.
  • Pitch & rhythm done, need testing
  • Harmonics hold


Confidence

  • Reproduce the experiments on fisher dataset.
  • Use the fisher DNN model to decode all-wsj dataset


Speaker ID

  • Preparing GMM-based server.

Emotion detection

  • Sinovoice is implementing the server


Text Processing

LM development

Domain specific LM

h2. ngram generation is on going h2. look the memory and baidu_hi done

h2. NUM tag LM:

  • maxi work is released.
  • yuanbin continue the tag lm work.
  • add the ner to tag lm .
  • Boost specific words like wifi if TAG model does not work for a particular word.


Word2Vector

W2V based doc classification

  • Initial results variable Bayesian GMM obtained. Performance is not as good as the conventional GMM.
  • Non-linear inter-language transform: English-Spanish-Czch: wv model training done, transform model on investigation
  • SSA-based local linear mapping still on running.
  • k-means classes change to 2.
  • Knowledge vector started
  • format the data
  • Character to word conversion
  • prepare the task: word similarity
  • prepare the dict.
  • Google word vector train
  • improve the sampling method

RNN LM

  • rnn
  • lstm+rnn
install the tool and prepare the data of wsj
prepare the baseline.

Translation

  • v3.0 demo released
  • still slow
  • re-segment the word using new dictionary.
  • check new data.

QA

  • search method:
  • add the vsm and BM25 to improve the search. and the strategy of selecting the answer
  • segment the word using minimum granularity for lucene index and bag-of-words method.
  • new inter will install SEMPRE