“Xingchao work”版本间的差异
来自cslt Wiki
第45行: | 第45行: | ||
Pre-process corpus, remove the sentence which contains rarely seen words. | Pre-process corpus, remove the sentence which contains rarely seen words. | ||
Start at : 2014-09-30 <--> End at : 2014-10-02 <--> Result : | Start at : 2014-09-30 <--> End at : 2014-10-02 <--> Result : | ||
− | Original lines is 8973724, Clean corpus (remove sentences which contain words less than 10) is | + | Original lines is 8973724, Clean corpus (remove sentences which contain words less than 10) is 6033397 |
Train Model. | Train Model. | ||
Start at : 2014-10-02 | Start at : 2014-10-02 |
2014年10月3日 (五) 06:28的版本
目录
Paper Recommendation
Pre-Trained Multi-View Word Embedding.[1]
Learning Word Representation Considering Proximity and Ambiguity.[2]
Continuous Distributed Representations of Words as Input of LSTM Network Language Model.[3]
WikiRelate! Computing Semantic Relatedness Using Wikipedia.[4]
Japanese-Spanish Thesaurus Construction Using English as a Pivot[5]
Chaos Work
SSA Model
Build 2-dimension SSA-Model. Start at : 2014-09-30 <--> End at : 2014-10-02 <--> Result is : 27.83% 46.53% 2 classify Test 25,50-dimension SSA-Model for transform Start at : 2014-10-02 <--> End at : 2014-10-03 <--> Result is : 11.96% 27.43% 50 classify Test All-Belong SSA model for transform Start at : 2014-10-02
SEMPRE Research
Work Schedule
Download SEMPRE toolkit. Start at : 2014-09-30
Semantic Parsing via Paraphrasing [6]
Knowledge Vector
Pre-process corpus. Start at : 2014-09-30. Use toolkit Wikipedia_Extractor [7] waiting End at : 2014-10-03 Result : Original corpus is about 47G and after preprocessing the corpus is almost 17.8G Analysis corpus, and training word2vec by wikipedia. Start at : 2014-10-03.
Moses translation model
Pre-process corpus, remove the sentence which contains rarely seen words. Start at : 2014-09-30 <--> End at : 2014-10-02 <--> Result : Original lines is 8973724, Clean corpus (remove sentences which contain words less than 10) is 6033397 Train Model. Start at : 2014-10-02
Non Linear Transform Testing
Work Schedule
Re-train best mse for test data. Start at : 2014-10-01 <--> End at : 2014-10-02 <--> Result : Performance is inconsistent to expectations. Best result for Non-Linear is 1e-2.