|
|
(相同用户的45个中间修订版本未显示) |
第1行: |
第1行: |
− | ==Paper Recommendation==
| + | =Chaos Work= |
− | Pre-Trained Multi-View Word Embedding.[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/3/3c/Pre-Trained_Multi-View_Word_Embedding.pdf]
| + | [[SLT]] |
− | | + | |
− | Learning Word Representation Considering Proximity and Ambiguity.[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/b/b0/Learning_Word_Representation_Considering_Proximity_and_Ambiguity.pdf]
| + | |
− | | + | |
− | Continuous Distributed Representations of Words as Input of LSTM Network Language Model.[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/5/5a/Continuous_Distributed_Representations_of_Words.pdf]
| + | |
− | | + | |
− | WikiRelate! Computing Semantic Relatedness Using Wikipedia.[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/c/cb/WikiRelate%21_Computing_Semantic_Relatedness_Using_Wikipedia.pdf]
| + | |
− | | + | |
− | Japanese-Spanish Thesaurus Construction Using English as a Pivot[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/e/e8/Japanese-Spanish_Thesaurus_Construction.pdf]
| + | |
− | | + | |
− | ==Chaos Work==
| + | |
− | | + | |
− | ===SSA Model===
| + | |
− | | + | |
− | Build 2-dimension SSA-Model.
| + | |
− | Start at : 2014-09-30 <--> End at : 2014-10-02 <--> Result is :
| + | |
− | 27.83% 46.53% 2 classify
| + | |
− | Test 25,50-dimension SSA-Model for transform
| + | |
− | Start at : 2014-10-02 <--> End at : 2014-10-03 <--> Result is :
| + | |
− | 27.9% 46.6% 1 classify
| + | |
− | 27.83% 46.53% 2 classify
| + | |
− | 27.43% 46.53% 3 classify
| + | |
− | 25.52% 45.83% 4 classify
| + | |
− | 25.62% 45.83% 5 classify
| + | |
− | 22.81% 42.51% 6 classify
| + | |
− | 11.96% 27.43% 50 classify
| + | |
− | Reason explain : There are some points doesn't belong to class which training data belongs to. So the
| + | |
− | transform doesn't share correct transform matrix.
| + | |
− | The method we want to update is just cluster the training data, and the test
| + | |
− | the performance.
| + | |
− | Simple cluster by 2 class.
| + | |
− | 23.51% 43.21% 2 classify
| + | |
− | Train set as test set
| + | |
− | Start at : 2014-10-06 <--> End at : 2014-10-08 <--> Result is :
| + | |
− | 63.98% 77.57% Simple 2 classify
| + | |
− | 58.81% 73.91% Total 3 classify
| + | |
− | | + | |
− | Test All-Belong SSA model for transform
| + | |
− | Start at : 2014-10-02
| + | |
− | | + | |
− | ===SEMPRE Research===
| + | |
− | ====Work Schedule ====
| + | |
− | Download SEMPRE toolkit.
| + | |
− | Start at : 2014-09-30
| + | |
− | | + | |
− | ====Paper related====
| + | |
− | Semantic Parsing via Paraphrasing [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/8/85/Semantic_Parsing_via_Paraphrasing.pdf]
| + | |
− | | + | |
− | ===Knowledge Vector===
| + | |
− | | + | |
− | Pre-process corpus.
| + | |
− | Start at : 2014-09-30.
| + | |
− | Use toolkit Wikipedia_Extractor [http://medialab.di.unipi.it/wiki/Wikipedia_Extractor] waiting
| + | |
− | End at : 2014-10-03 Result :
| + | |
− | Original corpus is about 47G and after preprocessing the corpus is almost 17.8G
| + | |
− | Analysis corpus, and training word2vec by wikipedia.
| + | |
− | Start at : 2014-10-03.
| + | |
− | Design Data Structure :
| + | |
− | { title : "", content : {Abs : [[details],[related link]], h2 : []}, category : []}
| + | |
− | | + | |
− | ===Moses translation model===
| + | |
− | | + | |
− | Pre-process corpus, remove the sentence which contains rarely seen words.
| + | |
− | Start at : 2014-09-30 <--> End at : 2014-10-02 <--> Result :
| + | |
− | Original lines is 8973724, Clean corpus (remove sentences which contain words less than 10) is 6033397
| + | |
− | Train Model.
| + | |
− | Start at : 2014-10-02 <--> End at : 2014-10-05
| + | |
− | Tuning Model.
| + | |
− | Start at : 2014-10-05
| + | |
− |
| + | |
− | | + | |
− | ===Non Linear Transform Testing===
| + | |
− | ====Work Schedule====
| + | |
− | Re-train best mse for test data.
| + | |
− | Start at : 2014-10-01 <--> End at : 2014-10-02 <--> Result :
| + | |
− | Performance is inconsistent to expectations. Best result for Non-Linear is 1e-2.
| + | |
− | Hidden Layer : 400 1 incorrect number: 840 5 incorrect number: 705 total number : 995
| + | |
− | 600 796 636 995
| + | |
− | 800 763 601 995
| + | |
− | 1200 804 646 995
| + | |
− | 1400 825 676 995
| + | |
− | Result : According to the result, I will test 800, 1200, 1400, and 1600 hidden layer.
| + | |