简体   繁体   English

Lucene中的通配符和WordDelimiterFilter

[英]Wildcards and WordDelimiterFilter in Lucene

I'm using lucene.net (2.3.2 ) and a ported version of the compatible WordDelimiterFilter, but when I perform wildcard searches with words with hyphens they don't work. 我正在使用lucene.net(2.3.2)和兼容的WordDelimiterFilter的移植版本,但是当我对带有连字符的单词执行通配符搜索时,它们将不起作用。

An example is the word "CL-276-0001" , when I search for "cl" / "cl-276" / "cl-276-0001" I find the record no problem (which is what I was initially trying to solve), but now when I search for "cl-276*" , or "cl-276-0*" it no longer works. 一个示例是单词“ CL-276-0001” ,当我搜索“ cl” / “ cl-276” / “ cl-276-0001”时,我发现记录没有问题(这是我最初试图解决的问题),但现在当我搜索“ cl-276 *”“ cl-276-0 *”时,它不再起作用。 "cl*" is unaffected, which leads me to believe it might be doing something with the query parser not adding the wildcard back onto whatever was called. “ cl *”不受影响,这使我相信查询解析器可能正在执行某些操作,而不会将通配符重新添加到所调用的内容上。

Any help to solve / understand this would be appreciated. 任何帮助解决/理解这一点将不胜感激。

Edit: I looked at the query produced by the query parser. 编辑:我看了查询解析器产生的查询。 It is exactly what is typed, I'm guessing this means that the search doesn't work because it looks for exactly what the user typed as a prefix. 正是键入的内容,我猜这意味着搜索不起作用,因为它查找的正是用户键入的前缀。 Now I'm thinking I shouldn't alter this behaviour. 现在我想我不应该改变这种行为。

Second Edit: Someone asked what the analyzer looks like: 第二次编辑:有人问分析仪是什么样的:

public override TokenStream TokenStream(string fieldName, TextReader reader)
{
    TokenStream result = new WhitespaceTokenizer(reader);
    result = new WordDelimiterFilter(result,1, 1, 1 , 1, 1 );
    result = new StandardFilter(result);
    result = new LowerCaseFilter(result);
    result = new StopFilter(result, LoadStopWords());
    return result;
}

CL-276-0001 is splitted into tokens [cl] [276] and [0001] by your analyzer and those tokens are stored in the index. CL-276-0001分为令牌[cl] [276]和[0001],这些令牌存储在索引中。
On the other and wildcard searches do not use analyzers only lowercase the search critaria. 另一方面,通配符搜索不使用分析器,而是仅将搜索小写字母小写。 Since your search critarion cl-276 (or cl-276-0 ) does not exist in the index you don't get any result. 由于您的搜索标准cl-276 (或cl-276-0 )不存在于索引中,因此您不会获得任何结果。

One solution for this can be building a TermQuery(casing is important) instead of using QueryParser. 一种解决方案可以是构建TermQuery(框很重要),而不是使用QueryParser。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM