简体   繁体   English

lucene中的“OR”查询

[英]"OR" query in lucene

I am trying to design an information retrieval system of a film database.我正在尝试设计一个电影数据库的信息检索系统。 I want to search by title so when i search "Cobra Kai" my analyzer descompose this string into "cobra kai", "cobra" and "kai" to do a better pairing.我想按标题搜索,所以当我搜索“Cobra Kai”时,我的分析器将这个字符串分解为“cobra kai”、“cobra”和“kai”以进行更好的配对。 So my problem is that I have to do a query like this: "cobra kai" OR "cobra" OR "kai" but it's not working for me.所以我的问题是我必须执行这样的查询:“cobra kai”或“cobra”或“kai”,但它对我不起作用。 Here is the code:这是代码:

ArrayList<String> busqueda_separada = muestraTexto(analyzer_titulo(), busquedaTitulo.getText());
                        
                        query1 = new TermQuery(new Term("titulo" ,busqueda_separada.get(0)));
                        query2 = new TermQuery(new Term("titulo" ,busqueda_separada.get(1)));
                        query3 = new TermQuery(new Term("titulo" ,busqueda_separada.get(2)));

                        nested.add(query1, BooleanClause.Occur.SHOULD);
                        nested.add(query2, BooleanClause.Occur.SHOULD);
                        nested.add(query3, BooleanClause.Occur.SHOULD);

                        bqbuilder.add(nested, BooleanClause.Occur.MUST);

And this is my error:这是我的错误: 错误

I have tried to do differents boolean clauses but it keeps the same.我试图做不同的 boolean 条款,但它保持不变。

From the error message we can see that you have defined nested as a variable of type BooleanQuery .从报错信息中我们可以看出,您已经将nested定义为BooleanQuery类型的变量。

As the error messages say, the class BooleanQuery does not have a method add(Query, Occur) .正如错误消息所说, class BooleanQuery没有方法add(Query, Occur) This means the following line will not compile:这意味着以下行将不会编译:

nested.add(query1, BooleanClause.Occur.SHOULD);

Instead, the code should be using a BooleanClause here, instead of a BooleanQuery .相反,代码应该在这里使用BooleanClause而不是BooleanQuery

One BooleanQuery is made up of one or more clauses, using BooleanClause .一个BooleanQuery由一个或多个子句组成,使用BooleanClause

So, you can do the following:因此,您可以执行以下操作:

BooleanQuery.Builder bqBuilder = new BooleanQuery.Builder();

Query query1 = new TermQuery(new Term("titulo", "cobra kai"));
Query query2 = new TermQuery(new Term("titulo", "cobra"));
Query query3 = new TermQuery(new Term("titulo", "kai"));

BooleanClause nested1 = new BooleanClause(query1, BooleanClause.Occur.SHOULD);
BooleanClause nested2 = new BooleanClause(query2, BooleanClause.Occur.SHOULD);
BooleanClause nested3 = new BooleanClause(query3, BooleanClause.Occur.SHOULD);

bqBuilder.add(nested1);
bqBuilder.add(nested2);
bqBuilder.add(nested3);

BooleanQuery bq = bqBuilder.build();

That builds a boolean query containing 3 clauses:这构建了一个包含 3 个子句的 boolean 查询:

Find titles containing "cobra kai" OR "cobra" OR "kai".查找包含“cobra kai”或“cobra”或“kai”的标题。

I am not sure what this is for:我不确定这是为了什么:

bqbuilder.add(nested, BooleanClause.Occur.MUST);

The BooleanClause.Occur.MUST does not appear to be needed, so I have dropped it from my code. BooleanClause.Occur.MUST似乎不需要,因此我已将其从我的代码中删除。


You can simplify the above code by using a loop.您可以使用循环来简化上面的代码。

Assuming you already have a list containing your search terms (your busqueda_separada list):假设您已经有一个包含搜索词的列表(您的busqueda_separada列表):

List<String> terms = Arrays.asList("cobra kai", "cobra", "kai");

You can use that list as follows:您可以按如下方式使用该列表:

for (String term : terms) {
    Query query = new TermQuery(new Term("titulo", term));
    BooleanClause nested = new BooleanClause(query, BooleanClause.Occur.SHOULD);
    bqBuilder.add(nested);
}
BooleanQuery bq2 = bqBuilder.build();

Update更新

One point I forgot to mention:有一点我忘了提:

In your data, you have a search phrase: cobra kai .在您的数据中,您有一个搜索短语: cobra kai It's possible that you do not need to search for this, depending on how your data was indexed, and how you expect your search to work.您可能不需要搜索此内容,具体取决于数据的索引方式以及您希望搜索的工作方式。

But assuming you do need it, you need to wrap the phrase in double-quotes, so that it is treated as a single search phrase by Lucene:但假设您确实需要它,您需要将该短语用双引号引起来,以便它被 Lucene 视为单个搜索短语:

List<String> terms = Arrays.asList("\"cobra kai\"", "cobra", "kai");

This ensures the generated search is:这确保生成的搜索是:

titulo:"cobra kai" titulo:cobra titulo:kai

And, by default, there is an implied "OR" in between each clause in the search.而且,默认情况下,搜索中的每个子句之间都有一个隐含的“或”。


Your "extra" question:您的“额外”问题:

query should be like (titulo=“cobra kai” OR titulo=“cobra” OR titulo=“kai”) AND anio=“2018”查询应该像(titulo=“cobra kai” OR titulo=“cobra” OR titulo=“kai”) AND anio=“2018”

This is really a completely new question and you can see approaches in existing answers such as:这确实是一个全新的问题,您可以在现有答案中看到一些方法,例如:

But one more approach (if I have understood correctly) is to nest 2 queries inside another boolean query and use Occur.MUST in that outer query for each clause.但另一种方法(如果我理解正确的话)是将 2 个查询嵌套在另一个 boolean 查询中,并在每个子句的外部查询中使用Occur.MUST

So, you already have your first boolean query.所以,您已经有了第一个 boolean 查询。

Now create another one.现在创建另一个。 Actually if you only have one term, you don't even need a boolean query - just a term query:实际上,如果您只有一个术语,您甚至不需要 boolean 查询 - 只需一个术语查询:

Query query2 = new TermQuery(new Term("year", "2018"));

Now place these two queries into a brand new boolean query (this new query contains the first two queries):现在将这两个查询放入一个全新的 boolean 查询中(这个新查询包含前两个查询):

BooleanQuery.Builder bqBuilder = new BooleanQuery.Builder();
bqBuilder.add(bq1, BooleanClause.Occur.MUST);
bqBuilder.add(query2, BooleanClause.Occur.MUST);
BooleanQuery bq = bqBuilder.build();

The above is equivalent to the following Lucene classic query:上面相当于下面的 Lucene 经典查询:

+(body:"cobra kai" body:cobra body:kai) +year:2018

And that, in turn, is equivalent to:反过来,这相当于:

(body:"cobra kai" OR body:cobra OR body:kai) AND year:2018

Note that this uses the plus operator .请注意,这使用了加号运算符

So the results MUST contain matches for both clauses - the clause for my body field and the clause for my year field.所以结果必须包含两个子句的匹配项——我的body字段的子句和我的year字段的子句。


This can all get quite confusing if you think about Lucene boolean operators in the same way that you think about Boolean algebra.如果您以与考虑 Boolean 代数相同的方式考虑 Lucene boolean 运算符,这一切都会变得非常混乱。 But they are not the same and serve different purposes.但它们并不相同,服务于不同的目的。 Lucene is not (only) about including and excluding records, but about scoring those records for relevance. Lucene 不是(仅)关于包括和排除记录,而是关于对这些记录的相关性进行评分

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM