|
|
第1行: |
第1行: |
− | ==Paper Recommendation== | + | =Chaos Work= |
− | Pre-Trained Multi-View Word Embedding.[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/3/3c/Pre-Trained_Multi-View_Word_Embedding.pdf]
| + | |
| | | |
− | Learning Word Representation Considering Proximity and Ambiguity.[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/b/b0/Learning_Word_Representation_Considering_Proximity_and_Ambiguity.pdf]
| + | == Binary Word Vector == |
| | | |
− | Continuous Distributed Representations of Words as Input of LSTM Network Language Model.[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/5/5a/Continuous_Distributed_Representations_of_Words.pdf]
| |
| | | |
− | WikiRelate! Computing Semantic Relatedness Using Wikipedia.[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/c/cb/WikiRelate%21_Computing_Semantic_Relatedness_Using_Wikipedia.pdf]
| + | == Ordered Word Vector == |
− | | + | |
− | Japanese-Spanish Thesaurus Construction Using English as a Pivot[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/e/e8/Japanese-Spanish_Thesaurus_Construction.pdf]
| + | |
− | | + | |
− | ==Chaos Work== | + | |
− | | + | |
− | | + | |
− | ===Temp Result Report===
| + | |
− | | + | |
− | | + | |
− | Result Report :
| + | |
− | | + | |
− | I have already train two sphere model. The first model I change hierachical softmax paramters to standard sphere. And another model is just change word vectors to standard sphere.
| + | |
− | The result shows the performance change hierachical parameters is almost 0% correct rate. So I will not write it in our result report.
| + | |
− | | + | |
− | Use the norm vector :
| + | |
− | | + | |
− | Linear Transform :
| + | |
− | | + | |
− | test : 1 correct 5 correct
| + | |
− | 10.25% 24.82%
| + | |
− | | + | |
− | train :
| + | |
− |
| + | |
− | | + | |
− | Sphere Transform:
| + | |
− | | + | |
− | test : 24.22% 41.01%
| + | |
− | | + | |
− | train :
| + | |
− | | + | |
− | Use original vector :
| + | |
− | Linear Transform:
| + | |
− | test : 26.83% 44.42%
| + | |
− | | + | |
− | train :
| + | |
− | Sphere Transform :
| + | |
− | test : 28.74% 46.73%
| + | |
− | | + | |
− | | + | |
− | | + | |
− | | + | |
− | ===SSA Model===
| + | |
− | | + | |
− | Build 2-dimension SSA-Model.
| + | |
− | Start at : 2014-09-30 <--> End at : 2014-10-02 <--> Result is :
| + | |
− | 27.83% 46.53% 2 classify
| + | |
− | Test 25,50-dimension SSA-Model for transform
| + | |
− | Start at : 2014-10-02 <--> End at : 2014-10-03 <--> Result is :
| + | |
− | 27.9% 46.6% 1 classify
| + | |
− | 27.83% 46.53% 2 classify
| + | |
− | 27.43% 46.53% 3 classify
| + | |
− | 25.52% 45.83% 4 classify
| + | |
− | 25.62% 45.83% 5 classify
| + | |
− | 22.81% 42.51% 6 classify
| + | |
− | 11.96% 27.43% 50 classify
| + | |
− | Reason explain : There are some points doesn't belong to class which training data belongs to. So the
| + | |
− | transform doesn't share correct transform matrix.
| + | |
− | The method we want to update is just cluster the training data, and the test
| + | |
− | the performance.
| + | |
− | Simple cluster by 2 class.
| + | |
− | 23.51% 43.21% 2 classify
| + | |
− | Train set as test set
| + | |
− | Start at : 2014-10-06 <--> End at : 2014-10-08 <--> Result is :
| + | |
− | 56.91% 72.16% Simple 1 classify
| + | |
− | 63.98% 77.57% Simple 2 classify
| + | |
− | 68.49% 81.25% Simple 4 classify
| + | |
− | 71.43% 83.21% Simple 5 classify
| + | |
− | 76.71% 87.07% Simple 6 classify
| + | |
− | Different compute state :
| + | |
− | Start at : 2014-10-10 <--> End at : 2014-10-10 <--> Result is :
| + | |
− | 23.51% 40.20% 7 classify
| + | |
− | Test All-Belong SSA model for transform
| + | |
− | Start at : 2014-10-02
| + | |
− | | + | |
− | ===SEMPRE Research===
| + | |
− | ====Work Schedule ====
| + | |
− | Download SEMPRE toolkit.
| + | |
− | Start at : 2014-09-30
| + | |
− | | + | |
− | ====Paper related====
| + | |
− | Semantic Parsing via Paraphrasing [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/8/85/Semantic_Parsing_via_Paraphrasing.pdf]
| + | |
− | | + | |
− | ===Knowledge Vector===
| + | |
− | | + | |
− | Pre-process corpus.
| + | |
− | Start at : 2014-09-30.
| + | |
− | Use toolkit Wikipedia_Extractor [http://medialab.di.unipi.it/wiki/Wikipedia_Extractor] waiting
| + | |
− | End at : 2014-10-03 Result :
| + | |
− | Original corpus is about 47G and after preprocessing the corpus is almost 17.8G
| + | |
− | Analysis corpus, and training word2vec by wikipedia.
| + | |
− | Start at : 2014-10-03.
| + | |
− | Design Data Structure :
| + | |
− | { title : "", content : {Abs : [[details],[related link]], h2 : []}, category : []}
| + | |
− | | + | |
− | ===Moses translation model===
| + | |
− | | + | |
− | Pre-process corpus, remove the sentence which contains rarely seen words.
| + | |
− | Start at : 2014-09-30 <--> End at : 2014-10-02 <--> Result :
| + | |
− | Original lines is 8973724, Clean corpus (remove sentences which contain words less than 10) is 6033397
| + | |
− | Train Model.
| + | |
− | Start at : 2014-10-02 <--> End at : 2014-10-05
| + | |
− | Tuning Model.
| + | |
− | Start at : 2014-10-05 <--> End at : 2014-10-10
| + | |
− | Result Report :
| + | |
− | 57G phrase in old translation system, 41G phrase in new system. And then testing load speed.
| + | |
− | | + | |
− | ===Non Linear Transform Testing===
| + | |
− | ====Work Schedule====
| + | |
− | Re-train best mse for test data.
| + | |
− | Start at : 2014-10-01 <--> End at : 2014-10-02 <--> Result :
| + | |
− | Performance is inconsistent to expectations. Best result for Non-Linear is 1e-2.
| + | |
− | Hidden Layer : 400 15.57% 29.14% 995
| + | |
− | 600 19.99% 36.08% 995
| + | |
− | 800 23.32% 39.60% 995
| + | |
− | 1200 19.19% 35.08% 995
| + | |
− | 1400 17.09% 32.06% 995
| + | |
− | Result : According to the result, I will test 800, 1200, 1400, and 1600 hidden layer.
| + | |
− | | + | |
− | ===New Approach===
| + | |
− | ====Date-3-26====
| + | |
− | Note: Run Wiki Vector Training Step.
| + | |
− | Pre-processing corpus 20-Newsgroups & Reuters-21578
| + | |
− | Pre-processing clean tag step done.
| + | |
− | ====Date-3-27====
| + | |
− | Learn how to use the Reuters corpus.
| + | |
− | Note: Read Papers :
| + | |
− | 1. Parallel Training of An Improved Neural Network for Text Categorization
| + | |
− | 2. A discriminative and semantic feature selection method for text categorization
| + | |
− | 3. Effective Use of Word Order for Text Categorization with Convolutional Neural Networks
| + | |
− | ====Date-3-31====
| + | |
− | Code new edition spherical word2vec.
| + | |
− | Begin to code VMF based cluster.
| + | |
− | | + | |
− | ====Date-4-26====
| + | |
− | ====Experience for orthogonal weights CNN.====
| + | |
− | dimension alpha
| + | |
− | 10 1e-4
| + | |
− | 100 1e-2
| + | |
− | ====Experience for basic CNN.====
| + | |
− | dimension alpha
| + | |
− | 100 1e-4
| + | |
− | ==Binary Word Vector==
| + | |
− | | + | |
− | ===Date-5-11===
| + | |
− | ====Hamming distance====
| + | |
− | =====Define=====
| + | |
− | <nowiki>In information theory, the Hamming distance between two strings of equal length is the number of positions at
| + | |
− | which the corresponding symbols are different. In another way, it measures the minimum number of substitutions required
| + | |
− | to change one string into the other, or the minimum number of errors that could have transformed one string into the other.</nowiki>
| + | |
− | | + | |
− | =====Examples=====
| + | |
− | "karolin" and "kathrin" is 3.
| + | |
− | "karolin" and "kerstin" is 3.
| + | |
− | 1011101 and 1001001 is 2.
| + | |
− | 2173896 and 2233796 is 3.
| + | |
− | From Wiki
| + | |
− | ===Date-5-12===
| + | |
− | ====Frobenius matrix norm====
| + | |
− | =====Define=====
| + | |
− | The Frobenius norm, sometimes also called the Euclidean norm (which may cause confusion with the vector L^2-norm which also sometimes known as the Euclidean norm), is matrix norm of an m×n matrix A defined as the square root of the sum of the absolute squares of its elements,
| + | |
− | | + | |
− | ||A||_F=sqrt(sum_(i=1)^msum_(j=1)^n|a_(ij)|^2)
| + | |