[英]Multiple words query in Lucene
For example: There is a column " description " in a Lucene document. 例如:Lucene文档中有一列“ description ”。 Let's say the content of " description " is [ hello foo bar
]. 假设“ description ”的内容为[ hello foo bar
]。 I want a query [ hello f
], then the document should be hit, [ hello ff
] or [ hello b
] should not be hit. 我想要查询[ hello f
],则不应单击该文档,而应单击[ hello ff
]或[ hello b
]。
I use the programmatic way to create the Query
, such as PrefixQuery
, TermQuery
were added to BooleanQuery
, but they don't work as expected. 我使用编程方式创建Query
,例如PrefixQuery
, TermQuery
已添加到BooleanQuery
,但它们不能按预期方式工作。 StandardAnalyzer
is used. StandardAnalyzer
。
Test cases: 测试用例:
a): new PrefixQuery(new Term("description", "hello f"))
-> 0 hit a): new PrefixQuery(new Term("description", "hello f"))
-> 0命中
b): PhraseQuery query = new PhraseQuery(); query.add( new Term("description", "hello f*") )
b): PhraseQuery query = new PhraseQuery(); query.add( new Term("description", "hello f*") )
PhraseQuery query = new PhraseQuery(); query.add( new Term("description", "hello f*") )
-> 0 hit PhraseQuery query = new PhraseQuery(); query.add( new Term("description", "hello f*") )
-> 0击
c): PhraseQuery query = new PhraseQuery(); query.add( new Term("description", "hello f") )
c): PhraseQuery query = new PhraseQuery(); query.add( new Term("description", "hello f") )
PhraseQuery query = new PhraseQuery(); query.add( new Term("description", "hello f") )
-> 0 hit PhraseQuery query = new PhraseQuery(); query.add( new Term("description", "hello f") )
-> 0击
Any recommendations? 有什么建议吗? Thanks! 谢谢!
It doesn't work because you are passing multiple terms to one Term
object . 它不起作用,因为您要将多个术语传递给一个Term
对象。 If you want all your search words to be prefix-found, you need to : 如果您希望所有搜索词都以前缀查找,则需要:
Tokenize the input string with your analyzer, it will split your search text "hello f" to "hello" and "f": 使用分析器对输入字符串进行标记,它将搜索文本“ hello f”分为“ hello”和“ f”:
TokenStream tokenStream = analyzer.tokenStream(null, new StringReader(searchText)); TokenStream tokenStream = Analyzer.tokenStream(null,新的StringReader(searchText)); CharTermAttribute termAttribute = tokenStream.getAttribute(CharTermAttribute.class); CharTermAttribute termAttribute = tokenStream.getAttribute(CharTermAttribute.class);
List tokens = new ArrayList(); 列表令牌= new ArrayList(); while (tokenStream.incrementToken()) { tokens.add(termAttribute.toString()); while(tokenStream.incrementToken()){tokens.add(termAttribute.toString()); } }
Put each token into Term
object which in turn needs to be put in PrefixQuery
and all PrefixQueries
to BooleanQuery
将每个令牌放入Term
对象,然后将其放入PrefixQuery
并将所有PrefixQueries
BooleanQuery
EDIT: For example like this: 编辑:例如这样的:
BooleanQuery booleanQuery = new BooleanQuery();
for(String token : tokens) {
booleanQuery.add(new PrefixQuery(new Term(fieldName, token)), Occur.MUST);
}
tried Ngram or EdgeNgram while indexing?? 索引时尝试过Ngram或EdgeNgram?
http://lucene.apache.org/core/old_versioned_docs/versions/2_9_0/api/all/org/apache/lucene/analysis/ngram/NGramTokenizer.html http://lucene.apache.org/core/old_versioned_docs/versions/2_9_0/api/all/org/apache/lucene/analysis/ngram/NGramTokenizer.html
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.