“Hulan-2014-11-06”版本间的差异
来自cslt Wiki
(以“=Dialog system= ==Algorithm== ===Spell mistake=== :* retrain the ngram model ===improve lucene search=== * our vsm method {| border="2px" |+ different result in luc...”为内容创建页面) |
(→Algorithm) |
||
第34行: | 第34行: | ||
:* add the new keyword value from proMe method | :* add the new keyword value from proMe method | ||
+ | ===Multi-Scene Recognition=== | ||
+ | * | ||
==knowledge structure== | ==knowledge structure== |
2014年11月6日 (四) 08:40的版本
目录
Dialog system
Algorithm
Spell mistake
- retrain the ngram model
improve lucene search
- our vsm method
method | lucene | vsm_idf(haiguan) | VSM_idf(baidu) | vsm_idf(tain) | vsm_idf(calculate) |
---|---|---|---|---|---|
Accary | 0.6628 | 0.6228 | 0.6197 | 0.5827 | 0.5426 |
- lucene top
- top10(82.95%),top20(86.34),top50(90.23%),top100(94.11%),top200(96.18%),top1000(97.31%),top2000(97.87%),top5000(98.75%),top10000(99.06)
- lucene Optimization(liurong)
- rewrite the method to select the 50 standard question not same template.
- check the word segment for template.
- boost the query keyword using IDF
method | Default | idf_train | idf_train_norm | idf_baidu | idf_baidu_norm |
---|---|---|---|---|---|
Accary | 0.66228 | 0.651629 | 0.57644 | 0.647869 | 0.65288 |
- TFIDF Formula
- coord(q,d)*query_boost*query_norm*sum(idf^2 * tf * term_boost * norm(t,d)) [1]
- add the new keyword value from proMe method
Multi-Scene Recognition
knowledge structure
- structure the default answer using attributes of the entity.
Knowledge Management and labeling system
- prepare the interface and function.
plan to do
plan to discuss
- add the triples search to QA engine