Lucene中的多个单词查询

Question

For example: There is a column " description " in a Lucene document. 例如：Lucene文档中有一列“ description ”。 Let's say the content of " description " is [ hello foo bar ]. 假设“ description ”的内容为[ hello foo bar ]。 I want a query [ hello f ], then the document should be hit, [ hello ff ] or [ hello b ] should not be hit. 我想要查询[ hello f ]，则不应单击该文档，而应单击[ hello ff ]或[ hello b ]。

I use the programmatic way to create the Query , such as PrefixQuery , TermQuery were added to BooleanQuery , but they don't work as expected. 我使用编程方式创建Query ，例如PrefixQuery ， TermQuery已添加到BooleanQuery ，但它们不能按预期方式工作。 StandardAnalyzer is used. StandardAnalyzer 。

Test cases: 测试用例：

a): new PrefixQuery(new Term("description", "hello f")) -> 0 hit a）： new PrefixQuery(new Term("description", "hello f")) -> 0命中

b): PhraseQuery query = new PhraseQuery(); query.add( new Term("description", "hello f*") ) b）： PhraseQuery query = new PhraseQuery(); query.add( new Term("description", "hello f*") ) PhraseQuery query = new PhraseQuery(); query.add( new Term("description", "hello f*") ) -> 0 hit PhraseQuery query = new PhraseQuery(); query.add( new Term("description", "hello f*") ) -> 0击

c): PhraseQuery query = new PhraseQuery(); query.add( new Term("description", "hello f") ) c）： PhraseQuery query = new PhraseQuery(); query.add( new Term("description", "hello f") ) PhraseQuery query = new PhraseQuery(); query.add( new Term("description", "hello f") ) -> 0 hit PhraseQuery query = new PhraseQuery(); query.add( new Term("description", "hello f") ) -> 0击

Any recommendations? 有什么建议吗？ Thanks! 谢谢！

Answer 1

It doesn't work because you are passing multiple terms to one Term object . 它不起作用，因为您要将多个术语传递给一个Term对象。 If you want all your search words to be prefix-found, you need to : 如果您希望所有搜索词都以前缀查找，则需要：

Tokenize the input string with your analyzer, it will split your search text "hello f" to "hello" and "f": 使用分析器对输入字符串进行标记，它将搜索文本“ hello f”分为“ hello”和“ f”：
TokenStream tokenStream = analyzer.tokenStream(null, new StringReader(searchText)); TokenStream tokenStream = Analyzer.tokenStream（null，新的StringReader（searchText））; CharTermAttribute termAttribute = tokenStream.getAttribute(CharTermAttribute.class); CharTermAttribute termAttribute = tokenStream.getAttribute（CharTermAttribute.class）;
List tokens = new ArrayList(); 列表令牌= new ArrayList（）; while (tokenStream.incrementToken()) { tokens.add(termAttribute.toString()); while（tokenStream.incrementToken（））{tokens.add（termAttribute.toString（））; } }
Put each token into Term object which in turn needs to be put in PrefixQuery and all PrefixQueries to BooleanQuery 将每个令牌放入Term对象，然后将其放入PrefixQuery并将所有PrefixQueries BooleanQuery

EDIT: For example like this: 编辑：例如这样的：

BooleanQuery booleanQuery = new BooleanQuery();

for(String token : tokens) {        
    booleanQuery.add(new PrefixQuery(new Term(fieldName, token)),  Occur.MUST);
}

Answer 2

tried Ngram or EdgeNgram while indexing?? 索引时尝试过Ngram或EdgeNgram？

http://lucene.apache.org/core/old_versioned_docs/versions/2_9_0/api/all/org/apache/lucene/analysis/ngram/NGramTokenizer.html http://lucene.apache.org/core/old_versioned_docs/versions/2_9_0/api/all/org/apache/lucene/analysis/ngram/NGramTokenizer.html

Lucene中的多个单词查询

问题描述

2 个解决方案

解决方案1
1 2012-12-17 10:09:48

解决方案2
0 2012-12-17 09:32:28

Lucene中的多个单词查询

问题描述

2 个解决方案

解决方案1 1 2012-12-17 10:09:48

解决方案2 0 2012-12-17 09:32:28

解决方案1
1 2012-12-17 10:09:48

解决方案2
0 2012-12-17 09:32:28