简体   繁体   English

在Solr中使用短语进行邻近搜索

[英]Proximity Search using phrases in Solr

I use Solr's proximity search quite often to search for words within a specified range of each other, like so 我经常使用Solr的邻近搜索来搜索彼此指定范围内的单词,就像这样

"Government Spending" ~2

I was wondering is there a way to perform a proximity search using a phrase and a word or two phrases. 我想知道是否有一种方法可以使用一个短语和一个或两个单词来执行邻近搜索。 Is this possible? 这可能吗? If so what is the syntax? 如果是这样,语法是什么?

This appears to be "somewhat" doable. 这似乎是“某种程度上”可行的。 Consider this text: 考虑以下文本:

This is more about traffic between Solr servers themselves 

"more traffic between solr" ~2 “ solr之间的流量更多”〜2

"more about between solr" ~2 “关于solr之间的更多信息”〜2

Even if you change the order it works: 即使您更改顺序也可以:

"more about solr between" ~2" ~2 “更多有关solr之间的信息”〜2“〜2

But too far apart and it stops working: 但是相距太远,它将停止工作:

"more about servers themselves" ~2 “有关服务器本身的更多信息”〜2

I think if that doesn't work, it would probably not be TOO hard to make a custom request handler that does this. 我认为,如果这样不起作用,那么创建一个可以做到这一点的自定义请求处理程序可能就不会太困难。 I think you might need to define a new syntax, prehaps something like ("phrase one" "phrase two") ~2 . 我认为您可能需要定义一种新的语法,大概是这样的("phrase one" "phrase two") ~2 〜2。 I would guess that if you are shingling, and you create a Lucene query where there is a token of just "phrase one" and another of "phrase two" that have a certain proximity, i think it will work. 我猜想,如果您正在瓦解,并且创建了一个Lucene查询,其中存在仅具有“短语一”和另一个具有“短语二”的标记,并且它们具有一定的接近度,那么我认为它将起作用。 (of course you will need to actually make the lucene java call, you can't just hand the query over (read this http://lucene.apache.org/java/2_2_0/api/index.html )). (当然,您实际上需要进行lucene java调用,您不能仅将查询移交给(阅读此http://lucene.apache.org/java/2_2_0/api/index.html ))。

Out of the box I have discovered a way to perform a Solr proximity search using more then one word, or phrases, see below 开箱即用,我发现了一种使用多个单词或短语执行Solr邻近搜索的方法,请参见下文

eg. 例如。 with 3 words: 3个字:

"(word1) (word2) (word3)"~10 “(word1)(word2)(word3)”〜10

eg. 例如。 with 2 phrases: (note the double quote needs to be escaped) 2个短语:(请注意,双引号必须转义)

"(\\"phrase1\\") (\\"phrase2\\")"~10 “(\\” phrase1 \\“)(\\” phrase2 \\“)”〜10

Since Solr 4 it is possible with SurroundQueryParser . 从Solr 4开始,可以使用SurroundQueryParser

Eg to query where "phrase two" follows "phrase one" not further than 3 words after: 例如,查询“短语二”后面“短语一”不超过三个词:

3W(phrase W one, phrase W two)

To query "phrase two" in proximity of 5 words of "phrase one": 要查询“短语一”的5个词附近的“短语二”:

5N(phrase W one, phrase W two)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM