“Hulan-2014-11-06”版本间的差异
来自cslt Wiki
(→improve lucene search) |
(→Multi-Scene Recognition) |
||
(相同用户的17个中间修订版本未显示) | |||
第2行: | 第2行: | ||
==Algorithm== | ==Algorithm== | ||
===Spell mistake=== | ===Spell mistake=== | ||
− | :* retrain the ngram model | + | :* retrain the ngram model('''caoli''') |
+ | :* prepare the test and development set('''caoli''') | ||
+ | |||
+ | ===improve fuzzy match=== | ||
+ | * add Synonyms similarity using MERT-4 method | ||
+ | |||
===improve lucene search=== | ===improve lucene search=== | ||
* our vsm method | * our vsm method | ||
第16行: | 第21行: | ||
* lucene top | * lucene top | ||
:* top10(82.95%),top20(86.34),top50(90.23%),top100(94.11%),top200(96.18%),top1000(97.31%),top2000(97.87%),top5000(98.75%),top10000(99.06) | :* top10(82.95%),top20(86.34),top50(90.23%),top100(94.11%),top200(96.18%),top1000(97.31%),top2000(97.87%),top5000(98.75%),top10000(99.06) | ||
+ | :* test the result of top(100,200,1000) in full qa(lucene+fuzzymatch)('''caoli''') | ||
* lucene Optimization(liurong) | * lucene Optimization(liurong) | ||
− | :* rewrite the method to select the 50 standard question not same template. | + | :* rewrite the method to select the 50 standard question not same template.(liurong) |
− | :* check the word segment for template. | + | :* check the word segment for template.(liurong) |
:* boost the query keyword using IDF | :* boost the query keyword using IDF | ||
{| border="2px" | {| border="2px" | ||
第30行: | 第36行: | ||
|- | |- | ||
|} | |} | ||
− | :* using MERT-4 method to get good value of multi-feature.like IDF,NER,baidu_weight,keyword etc. | + | :* using MERT-4 method to get good value of multi-feature.like IDF,NER,baidu_weight,keyword etc.('''liurong this month''') |
===Multi-Scene Recognition=== | ===Multi-Scene Recognition=== | ||
− | * add the triples search to QA engine | + | * add the triples search to QA engine |
− | :* discuss the detail and give a report. | + | :* discuss the detail and give a report.('''liurong''') |
+ | * demo ('''liurong two week''') | ||
==knowledge structure== | ==knowledge structure== | ||
− | + | ||
==Knowledge Management and labeling system== | ==Knowledge Management and labeling system== | ||
− | * | + | * continue coding. |
− | == | + | ==Patent== |
+ | * the GA method to improve QA .(liurong this month) | ||
==plan to discuss== | ==plan to discuss== | ||
− | * add the | + | * how to add the spell check method to QA engine. |
2014年11月6日 (四) 09:07的最后版本
目录
Dialog system
Algorithm
Spell mistake
- retrain the ngram model(caoli)
- prepare the test and development set(caoli)
improve fuzzy match
- add Synonyms similarity using MERT-4 method
improve lucene search
- our vsm method
method | lucene | vsm_idf(haiguan) | VSM_idf(baidu) | vsm_idf(tain) | vsm_idf(calculate) |
---|---|---|---|---|---|
Accary | 0.6628 | 0.6228 | 0.6197 | 0.5827 | 0.5426 |
- lucene top
- top10(82.95%),top20(86.34),top50(90.23%),top100(94.11%),top200(96.18%),top1000(97.31%),top2000(97.87%),top5000(98.75%),top10000(99.06)
- test the result of top(100,200,1000) in full qa(lucene+fuzzymatch)(caoli)
- lucene Optimization(liurong)
- rewrite the method to select the 50 standard question not same template.(liurong)
- check the word segment for template.(liurong)
- boost the query keyword using IDF
method | Default | idf_train | idf_train_norm | idf_baidu | idf_baidu_norm |
---|---|---|---|---|---|
Accary | 0.66228 | 0.651629 | 0.57644 | 0.647869 | 0.65288 |
- using MERT-4 method to get good value of multi-feature.like IDF,NER,baidu_weight,keyword etc.(liurong this month)
Multi-Scene Recognition
- add the triples search to QA engine
- discuss the detail and give a report.(liurong)
- demo (liurong two week)
knowledge structure
Knowledge Management and labeling system
- continue coding.
Patent
- the GA method to improve QA .(liurong this month)
plan to discuss
- how to add the spell check method to QA engine.