“Hulan-2014-10-31”版本间的差异
来自cslt Wiki
(以“==Dialog system== ==Algorithm== ===Spell mistake=== :* using ngram to get candidate sentence. ===improve lucene search=== * lucene similarity method {| border="2px"...”为内容创建页面) |
(→improve lucene search) |
||
(相同用户的7个中间修订版本未显示) | |||
第1行: | 第1行: | ||
− | + | =Dialog system= | |
==Algorithm== | ==Algorithm== | ||
===Spell mistake=== | ===Spell mistake=== | ||
− | :* using ngram to get candidate sentence. | + | :* using ngram to get candidate sentence.(xingchao) |
===improve lucene search=== | ===improve lucene search=== | ||
* lucene similarity method | * lucene similarity method | ||
第16行: | 第16行: | ||
* our vsm method | * our vsm method | ||
− | :* our vsm method re-rank(54%),lucene( | + | :* our vsm method re-rank(54%),lucene(66.28%) |
* lucene top50(caoli) | * lucene top50(caoli) | ||
第26行: | 第26行: | ||
:* test the boost keyword weight and extract the synonyms word. | :* test the boost keyword weight and extract the synonyms word. | ||
:* check the word segment for template. | :* check the word segment for template. | ||
− | :* min-segment method improve the accuracy. | + | :* min-segment method improve the accuracy.(0.61->0.66) |
+ | :* check the query method for getting lucene information and to rewrite the score method like the idf value. | ||
+ | |||
+ | *IDF(caoli) | ||
+ | :* test the different idf vale from baidu sougou in fuzzymatch. | ||
+ | :* IDF from train-data performance bad than default IDF,from 0.63->0.69. | ||
+ | |||
+ | ==knowledge structure== | ||
+ | * structure the default answer using attributes of the entity. | ||
+ | ==Knowledge Management and labeling system== | ||
+ | * prepare the interface and function. | ||
==plan to discuss== | ==plan to discuss== | ||
+ | * add the triples search to QA engine |
2014年10月31日 (五) 05:32的最后版本
目录
Dialog system
Algorithm
Spell mistake
- using ngram to get candidate sentence.(xingchao)
improve lucene search
- lucene similarity method
method | Default | BM25 | LMDirichlet | DFR | LMJelinekMercer | IB |
---|---|---|---|---|---|---|
Accary | 0.66228 | 0.66228 | 0.4091 | 0.65476 | 0.65476 | 0.6666 |
- our vsm method
- our vsm method re-rank(54%),lucene(66.28%)
- lucene top50(caoli)
- top10(82.95%),top20(86.34),top50(90.22%)
- need to check the other 10% error
- lucene Optimization(liurong)
- rewrite the method to select the 50 standard question not same template.
- test the boost keyword weight and extract the synonyms word.
- check the word segment for template.
- min-segment method improve the accuracy.(0.61->0.66)
- check the query method for getting lucene information and to rewrite the score method like the idf value.
- IDF(caoli)
- test the different idf vale from baidu sougou in fuzzymatch.
- IDF from train-data performance bad than default IDF,from 0.63->0.69.
knowledge structure
- structure the default answer using attributes of the entity.
Knowledge Management and labeling system
- prepare the interface and function.
plan to discuss
- add the triples search to QA engine