[英]Hierarchical scoring Lucene, OR term treatment
我試圖將興趣資料轉換為一些Lucene查詢。
給定標題術語和一些擴展術語,采用JSON格式,例如
{"title":"Donald Trump", "Expansion":[["republic","republican"],["democratic","democrat"],["campaign"]]}
相應的Lucene查詢可以是如下的BooleanQuery(將標題項提升因子設置為3.0,而將擴展項提升因子設置為1.0)。
+(text:donald^3.0 text:trump^3.0 (text:democrat text:democratic) (text:republic text:republican) text:campaign)
使用IndexSearcher's explain()
方法,
匹配的文檔,例如
I know people just want to find a way to be famous without taking any risks, republic republican Donald Trump Campaign.
得分9.0
3.0 = weight(text:donald^3.0 in 0) [TitleExpansionSimilarity], result of:
3.0 = score(doc=0,freq=1.0), product of:
3.0 = queryWeight, product of:
3.0 = boost
1.0 = idf(docFreq=201, maxDocs=201)
1.0 = queryNorm
1.0 = fieldWeight in 0, product of:
1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
1.0 = idf(docFreq=201, maxDocs=201)
1.0 = fieldNorm(doc=0)
3.0 = weight(text:trump^3.0 in 0) [TitleExpansionSimilarity], result of:
3.0 = score(doc=0,freq=1.0), product of:
3.0 = queryWeight, product of:
3.0 = boost
1.0 = idf(docFreq=201, maxDocs=201)
1.0 = queryNorm
1.0 = fieldWeight in 0, product of:
1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
1.0 = idf(docFreq=201, maxDocs=201)
1.0 = fieldNorm(doc=0)
2.0 = sum of:
1.0 = weight(text:republic in 0) [TitleExpansionSimilarity], result of:
1.0 = fieldWeight in 0, product of:
1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
1.0 = idf(docFreq=201, maxDocs=201)
1.0 = fieldNorm(doc=0)
1.0 = weight(text:republican in 0) [TitleExpansionSimilarity], result of:
1.0 = fieldWeight in 0, product of:
1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
1.0 = idf(docFreq=201, maxDocs=201)
1.0 = fieldNorm(doc=0)
1.0 = weight(text:campaign in 0) [TitleExpansionSimilarity], result of:
1.0 = fieldWeight in 0, product of:
1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
1.0 = idf(docFreq=201, maxDocs=201)
1.0 = fieldNorm(doc=0)
有什么方法可以重寫Lucene評分功能,也可以對BooleanQuery(text:republic text:republican)進行評分。 群集["republic","republican"]
作為“ republic”的匹配權重或“ republican”的匹配權重的最大值?
1.0 = MAX(instead of sum) of:
1.0 = weight(text:republic in 0) [TitleExpansionSimilarity], result of:
1.0 = fieldWeight in 0, product of:
1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
1.0 = idf(docFreq=201, maxDocs=201)
1.0 = fieldNorm(doc=0)
1.0 = weight(text:republican in 0) [TitleExpansionSimilarity], result of:
1.0 = fieldWeight in 0, product of:
1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
1.0 = idf(docFreq=201, maxDocs=201)
1.0 = fieldNorm(doc=0)
不是通過Lucene的QueryParser語法,而是可以使用DisjunctionMaxQuery
,而不是BooleanQuery
來組合查詢和得分以及其子查詢的最高得分,而不是子查詢得分的總和。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.