|
|
(相同用户的23个中间修订版本未显示) |
第1行: |
第1行: |
− | ==Paper Recommendation==
| + | =Chaos Work= |
− | Pre-Trained Multi-View Word Embedding.[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/3/3c/Pre-Trained_Multi-View_Word_Embedding.pdf]
| + | [[SLT]] |
− | | + | |
− | Learning Word Representation Considering Proximity and Ambiguity.[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/b/b0/Learning_Word_Representation_Considering_Proximity_and_Ambiguity.pdf]
| + | |
− | | + | |
− | Continuous Distributed Representations of Words as Input of LSTM Network Language Model.[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/5/5a/Continuous_Distributed_Representations_of_Words.pdf]
| + | |
− | | + | |
− | WikiRelate! Computing Semantic Relatedness Using Wikipedia.[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/c/cb/WikiRelate%21_Computing_Semantic_Relatedness_Using_Wikipedia.pdf]
| + | |
− | | + | |
− | Japanese-Spanish Thesaurus Construction Using English as a Pivot[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/e/e8/Japanese-Spanish_Thesaurus_Construction.pdf]
| + | |
− | | + | |
− | ==Chaos Work==
| + | |
− | | + | |
− | | + | |
− | ===Temp Result Report===
| + | |
− | | + | |
− | | + | |
− | Result Report :
| + | |
− | | + | |
− | I have already train two sphere model. The first model I change hierachical softmax paramters to standard sphere. And another model is just change word vectors to standard sphere.
| + | |
− | The result shows the performance change hierachical parameters is almost 0% correct rate. So I will not write it in our result report.
| + | |
− | | + | |
− | Use the norm vector :
| + | |
− | | + | |
− | Linear Transform :
| + | |
− | | + | |
− | test : 1 correct 5 correct
| + | |
− | 10.25% 24.82%
| + | |
− | | + | |
− | train :
| + | |
− |
| + | |
− | | + | |
− | Sphere Transform:
| + | |
− | | + | |
− | test : 24.22% 41.01%
| + | |
− | | + | |
− | train :
| + | |
− | | + | |
− | Use original vector :
| + | |
− | Linear Transform:
| + | |
− | test : 26.83% 44.42%
| + | |
− | | + | |
− | train :
| + | |
− | Sphere Transform :
| + | |
− | test : 28.74% 46.73%
| + | |
− | | + | |
− | | + | |
− | | + | |
− | | + | |
− | ===SSA Model===
| + | |
− | | + | |
− | Build 2-dimension SSA-Model.
| + | |
− | Start at : 2014-09-30 <--> End at : 2014-10-02 <--> Result is :
| + | |
− | 27.83% 46.53% 2 classify
| + | |
− | Test 25,50-dimension SSA-Model for transform
| + | |
− | Start at : 2014-10-02 <--> End at : 2014-10-03 <--> Result is :
| + | |
− | 27.9% 46.6% 1 classify
| + | |
− | 27.83% 46.53% 2 classify
| + | |
− | 27.43% 46.53% 3 classify
| + | |
− | 25.52% 45.83% 4 classify
| + | |
− | 25.62% 45.83% 5 classify
| + | |
− | 22.81% 42.51% 6 classify
| + | |
− | 11.96% 27.43% 50 classify
| + | |
− | Reason explain : There are some points doesn't belong to class which training data belongs to. So the
| + | |
− | transform doesn't share correct transform matrix.
| + | |
− | The method we want to update is just cluster the training data, and the test
| + | |
− | the performance.
| + | |
− | Simple cluster by 2 class.
| + | |
− | 23.51% 43.21% 2 classify
| + | |
− | Train set as test set
| + | |
− | Start at : 2014-10-06 <--> End at : 2014-10-08 <--> Result is :
| + | |
− | 56.91% 72.16% Simple 1 classify
| + | |
− | 63.98% 77.57% Simple 2 classify
| + | |
− | 68.49% 81.25% Simple 4 classify
| + | |
− | 71.43% 83.21% Simple 5 classify
| + | |
− | 76.71% 87.07% Simple 6 classify
| + | |
− | Different compute state :
| + | |
− | Start at : 2014-10-10 <--> End at : 2014-10-10 <--> Result is :
| + | |
− | 23.51% 40.20% 7 classify
| + | |
− | Test All-Belong SSA model for transform
| + | |
− | Start at : 2014-10-02
| + | |
− | | + | |
− | ===SEMPRE Research===
| + | |
− | ====Work Schedule ====
| + | |
− | Download SEMPRE toolkit.
| + | |
− | Start at : 2014-09-30
| + | |
− | | + | |
− | ====Paper related====
| + | |
− | Semantic Parsing via Paraphrasing [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/8/85/Semantic_Parsing_via_Paraphrasing.pdf]
| + | |
− | | + | |
− | ===Knowledge Vector===
| + | |
− | | + | |
− | Pre-process corpus.
| + | |
− | Start at : 2014-09-30.
| + | |
− | Use toolkit Wikipedia_Extractor [http://medialab.di.unipi.it/wiki/Wikipedia_Extractor] waiting
| + | |
− | End at : 2014-10-03 Result :
| + | |
− | Original corpus is about 47G and after preprocessing the corpus is almost 17.8G
| + | |
− | Analysis corpus, and training word2vec by wikipedia.
| + | |
− | Start at : 2014-10-03.
| + | |
− | Design Data Structure :
| + | |
− | { title : "", content : {Abs : [[details],[related link]], h2 : []}, category : []}
| + | |
− | | + | |
− | ===Moses translation model===
| + | |
− | | + | |
− | Pre-process corpus, remove the sentence which contains rarely seen words.
| + | |
− | Start at : 2014-09-30 <--> End at : 2014-10-02 <--> Result :
| + | |
− | Original lines is 8973724, Clean corpus (remove sentences which contain words less than 10) is 6033397
| + | |
− | Train Model.
| + | |
− | Start at : 2014-10-02 <--> End at : 2014-10-05
| + | |
− | Tuning Model.
| + | |
− | Start at : 2014-10-05 <--> End at : 2014-10-10
| + | |
− | Result Report :
| + | |
− | 57G phrase in old translation system, 41G phrase in new system. And then testing load speed.
| + | |
− | | + | |
− | ===Non Linear Transform Testing===
| + | |
− | ====Work Schedule====
| + | |
− | Re-train best mse for test data.
| + | |
− | Start at : 2014-10-01 <--> End at : 2014-10-02 <--> Result :
| + | |
− | Performance is inconsistent to expectations. Best result for Non-Linear is 1e-2.
| + | |
− | Hidden Layer : 400 15.57% 29.14% 995
| + | |
− | 600 19.99% 36.08% 995
| + | |
− | 800 23.32% 39.60% 995
| + | |
− | 1200 19.19% 35.08% 995
| + | |
− | 1400 17.09% 32.06% 995
| + | |
− | Result : According to the result, I will test 800, 1200, 1400, and 1600 hidden layer.
| + | |
− | | + | |
− | ===New Approach===
| + | |
− | ====Date-3-26====
| + | |
− | Note: Run Wiki Vector Training Step.
| + | |
− | Pre-processing corpus 20-Newsgroups & Reuters-21578
| + | |
− | Pre-processing clean tag step done.
| + | |
− | ====Date-3-27====
| + | |
− | Learn how to use the Reuters corpus.
| + | |
− | Note: Read Papers :
| + | |
− | 1. Parallel Training of An Improved Neural Network for Text Categorization
| + | |
− | 2. A discriminative and semantic feature selection method for text categorization
| + | |
− | 3. Effective Use of Word Order for Text Categorization with Convolutional Neural Networks
| + | |
− | ====Date-3-31====
| + | |
− | Code new edition spherical word2vec.
| + | |
− | Begin to code VMF based cluster.
| + | |
− | | + | |
− | ====Date-4-26====
| + | |
− | ====Experience for orthogonal weights CNN.====
| + | |
− | dimension alpha
| + | |
− | 10 1e-4
| + | |
− | 100 1e-2
| + | |
− | ====Experience for basic CNN.====
| + | |
− | dimension alpha
| + | |
− | 100 1e-4
| + | |
− | ==Binary Word Vector==
| + | |
− | | + | |
− | ===Date-5-11===
| + | |
− | ====Hamming distance====
| + | |
− | =====Define=====
| + | |
− | In information theory, the Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different. In another way, it measures the minimum number of substitutions required to change one string into the other, or the minimum number of errors that could have transformed one string into the other.
| + | |
− | | + | |
− | =====Examples=====
| + | |
− | "karolin" and "kathrin" is 3.
| + | |
− | "karolin" and "kerstin" is 3.
| + | |
− | 1011101 and 1001001 is 2.
| + | |
− | 2173896 and 2233796 is 3.
| + | |
− | From Wiki
| + | |