Hulan-2014-10-31

来自cslt Wiki

2014年10月31日 (五) 05:32Lr（讨论 | 贡献）的版本

(差异) ←上一版本 | 最后版本 (差异) | 下一版本→ (差异)

跳转至：导航、搜索

目录

1 Dialog system

Dialog system

Algorithm

Spell mistake

using ngram to get candidate sentence.(xingchao)

improve lucene search

lucene similarity method

different result in lucene
method	Default	BM25	LMDirichlet	DFR	LMJelinekMercer	IB
Accary	0.66228	0.66228	0.4091	0.65476	0.65476	0.6666

our vsm method

our vsm method re-rank(54%),lucene(66.28%)

lucene top50(caoli)

top10(82.95%),top20(86.34),top50(90.22%)
need to check the other 10% error

lucene Optimization(liurong)

rewrite the method to select the 50 standard question not same template.
test the boost keyword weight and extract the synonyms word.
check the word segment for template.
min-segment method improve the accuracy.(0.61->0.66)
check the query method for getting lucene information and to rewrite the score method like the idf value.

IDF(caoli)

test the different idf vale from baidu sougou in fuzzymatch.
IDF from train-data performance bad than default IDF,from 0.63->0.69.

knowledge structure

structure the default answer using attributes of the entity.

Knowledge Management and labeling system

prepare the interface and function.

plan to discuss

add the triples search to QA engine

取自“http://cslt.org/mediawiki/index.php?title=Hulan-2014-10-31&oldid=12181”