简体   繁体   English

您如何用Lucene找到一个短语?

[英]How do you find a phrase with Lucene?

I hope the way I worded my question is correct, though I could be mistaken. 我希望我的提问方式正确无误,尽管我可能会误会。 Basically, I have an index with term vectors, positions, and offsets, and I want to be able to do the following: when I see the word "do", check to see if the next word is "you". 基本上,我有一个包含术语向量,位置和偏移量的索引,并且我希望能够执行以下操作:当我看到单词“ do”时,请检查下一个单词是否为“ you”。 If so, treat those two words as one phrase for the purposes of scoring. 如果是这样,出于评分的目的,请将这两个单词视为一个短语。 I'm doing this to avoid splitting up words that are commonly used together anyway. 我这样做是为了避免将通常一起使用的单词分开。 Instead of my list of words sorted by score looking like this, 而不是像这样按分数排序的单词列表,

do 
want
you
come
to

I'd like to see something more like this 我想看更多这样的东西

do you
want
come
to

One workaround would be index both by word and by phrase, so your scoring list would be: 一种解决方法是按单词和短语进行索引,因此您的得分列表为:

do you
want
come
to
do
you

If you then applied a boost to your phrases during indexing, you would be closer to your goal. 如果随后在索引过程中对短语进行了增强,那么您将更接近目标。 But that depends on whether matching phrases should always rank higher than their individual words. 但这取决于匹配短语是否始终应高于其单个单词。

It might also be worth looking at Boosting Lucene Terms When Building the Index . 在构建索引时 ,可能还需要查看Boosting Lucene术语

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM