“Schedule”版本间的差异
来自cslt Wiki
(→Reproduce DSSM Baseline (Chao Xing)) |
|||
第31行: | 第31行: | ||
===Reproduce DSSM Baseline (Chao Xing)=== | ===Reproduce DSSM Baseline (Chao Xing)=== | ||
+ | : 2016-04-21 : True DSSM model doesn't work well, analysis as below: | ||
+ | 1. Not exactly reproduce DSSM model, because the original one is English version, I just adapt it to Chinese but after word segmentation. | ||
+ | So the input is tri-gram words not tri-gram letter. | ||
+ | 2. Our dataset far from rich, because of we do not use pre-trained word vectors as initial vectors, we can hardly achieve good performance. | ||
+ | : Request : | ||
+ | 1. As we have rich pre-trained word vectors, maybe CDSSM or RDSSM corrected to our task. | ||
+ | 2. Different length of sequences seek to be fixed dimension vectors, just CNN and RNN can do such things, DNN can not do it by using | ||
+ | fix length of word vectors | ||
+ | : Coding done CDSSM. Test for it's performance. | ||
+ | One problem : When you install tensorflow by pip 0.8.0 and you want to use conv2d function by gpu, you need make sure you had already | ||
+ | install your cudnn's version as 4.0 not lastest 5.0. | ||
: 2016-04-20 : Find reproduced DSSM model's bug, fix it. | : 2016-04-20 : Find reproduced DSSM model's bug, fix it. | ||
: 2016-04-19 : Code mixture data model by less memory dependency done. Test it's performance. | : 2016-04-19 : Code mixture data model by less memory dependency done. Test it's performance. |
2016年4月22日 (五) 03:10的版本
目录
- 1 Text Processing Team Schedule
- 1.1 Members
- 1.2 Work Process
- 1.2.1 Similar questions senetence vector model training with RNN/LSTM and the attention RNN/LSTM chatting model training (Tianyi Luo)
- 1.2.2 Reproduce DSSM Baseline (Chao Xing)
- 1.2.3 Deep Poem Processing With Image (Ziwei Bai)
- 1.2.4 RNN Music Processing for lyric (Shiyao Li)
- 1.2.5 RNN Key word Poem Processing (Yi Xiong)
- 1.2.6 RNN Piano Processing (Jiyuan Zhang)
- 1.2.7 Recommendation System (Tong Liu)
- 1.2.8 Question & Answering (Aiting Liu)
Text Processing Team Schedule
Members
Former Members
- Rong Liu (刘荣) : 优酷
- Xiaoxi Wang (王晓曦) : 图灵机器人
- Xi Ma (马习) : 清华大学研究生
- DongXu Zhang (张东旭) : --
Current Members
- Tianyi Luo (骆天一)
- Chao Xing (邢超)
- Qixin Wang (王琪鑫)
- Yiqiao Pan (潘一桥)
Work Process
Similar questions senetence vector model training with RNN/LSTM and the attention RNN/LSTM chatting model training (Tianyi Luo)
2016-04-18
- Optimize theano version of Generationg the similar questions' vectors based on RNN.
- Finish implementing theano version of LSTM Max margin vector training.
2016-04-19
- Optimize theano version of Generationg the similar questions' vectors based on RNN.
2016-04-20
- Finish submiting the camera version paper of IJCAI 2016.
- Update the version of Technical Report about Chinese Song Iambics generation.
2016-04-21
- Finish helping Teacher Wang to prepare for text group's presentation(Tang poetry and Songci generation and Intelligent QA system) for Tsinghua University's 105 anniversary.
- Submit our IJCAI paper to arxiv. (Solve a big problem about submitting the paper including Chinese chacracters. Solution
)
- Optimize theano version of Generationg the similar questions' vectors based on RNN.
Reproduce DSSM Baseline (Chao Xing)
- 2016-04-21 : True DSSM model doesn't work well, analysis as below:
1. Not exactly reproduce DSSM model, because the original one is English version, I just adapt it to Chinese but after word segmentation. So the input is tri-gram words not tri-gram letter. 2. Our dataset far from rich, because of we do not use pre-trained word vectors as initial vectors, we can hardly achieve good performance. : Request : 1. As we have rich pre-trained word vectors, maybe CDSSM or RDSSM corrected to our task. 2. Different length of sequences seek to be fixed dimension vectors, just CNN and RNN can do such things, DNN can not do it by using fix length of word vectors : Coding done CDSSM. Test for it's performance. One problem : When you install tensorflow by pip 0.8.0 and you want to use conv2d function by gpu, you need make sure you had already install your cudnn's version as 4.0 not lastest 5.0.
- 2016-04-20 : Find reproduced DSSM model's bug, fix it.
- 2016-04-19 : Code mixture data model by less memory dependency done. Test it's performance.
- 2016-04-18 : Code mixture data model.
- 2016-04-16 : Code mixture data model, but face to memory error. Dr. Wang help me fix it.
- 2016-04-15 : Share Papers. Investigation a series of DSSM papers for future work. And show our intern students how to do research.
: Original DSSM model : Learning Deep Structured Semantic Models for Web Search using Clickthrough Data pdf : CNN based DSSM model : A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval pdf : Use DSSM model for a new area : Modeling Interestingness with Deep Neural Networks pdf : Latest approach for LSTM + RNN DSSM model : SEMANTIC MODELLING WITH LONG-SHORT-TERM MEMORY FOR INFORMATION RETRIEVAL pdf
- 2016-04-14 : Test dssm-dnn model, code dssm-cnn model.
Continue investigate deep neural question answering system.
- 2016-04-13 : test dssm model, investigate deep neural question answering system.
: Share theano ppt theano : Share tensorflow ppt tensorflow
- 2016-04-12 : Write done dssm tensor flow version.
- 2016-04-11 : Write tensorflow toolkit ppt for intern student.
- 2016-04-10 : Learn tensorflow toolkit.
- 2016-04-09 : Learn tensorflow toolkit.
- 2016-04-08 : Finish theano version.
Deep Poem Processing With Image (Ziwei Bai)
- 2016-04-20 :combine my program with Qixin Wang's
- 2016-04-10 : web spider to catch a thousand pices of images.
- 2016-04-13 :1、download theano for python2.7。 2.debug cnn.py
- 2016-04-15 :web spider to catch 30 thousands pices of images and store them into a matrix
- 2016-04-16 :modify the code of CNN and spider
- 2016-04-17 :train convouloutional neural network
RNN Music Processing for lyric (Shiyao Li)
- 2016-04-20 : learn LSTM
- 2016-04-09 : web spider to catch a thousand pieces of lyrics.
- 2016-04-10 : extract the keywords in the lyrics
- 2016-04-13 :Read paper Memory Network.
- 2016-04-15 :read the paper Memory Network and start to understand its code
- 2016-04-17 :read paper end to end memory network
RNN Key word Poem Processing (Yi Xiong)
- 2016-04-20 : learn web spider
- 2016-04-09 : Database for N-Gram data storing
- 2016-04-10 : dictionary stored in database , dictionary based segmentation and a simple bigram segmentation
- 2016-04-13 : segmentation result analysis
- 2016-04-15 :improve the simple bigram segmentation
- 2016-04-16 :compare the result of bigram segmentation with dictionary segmentation
- 2016-04-17 :learn python (head first 50%)
RNN Piano Processing (Jiyuan Zhang)
- 2016-4-12:select appropriate midis and run rnnrbm model
- 2016-4-13:view rnnrbm model‘s code
Recommendation System (Tong Liu)
- 2016-04-09 : 1.read a review:Machine learning:Trends,perspectives, and prospects 2.learn python ,can operate dict and set
- 2016-04-12 : 1.read paper Collaborative Deep Learning for Recommender Systems and take notes.2. learn the concepts of stacked denoising autoencoder(SDAE).
- 2016-04-17 :1.allocate PuTTy and Xming 2.learn python, can operate slice and iterator 3.learn release and datasets of a paper: Collaborative Deep Learning for Recommender Systems
Question & Answering (Aiting Liu)
- 2016-04-20 : read Fader's paper ()2013
- 2016-04-15 :learn dssm and sent2vec
- 2016-04-16 :try to figure out how thePARALAX dataset is constructed
- 2016-04-17 :download the PARALAX dataset and turn it into what we want it to be