简体   繁体   English

Solr查询分析器下划线

[英]solr query analyzer underscore

I have underscore separated and camel case values ( eq "SimplyShopping_Rediff") in the field of the document with fieldtype as text_ws. 我在文档的字段中使用下划线分隔和驼峰式大小写值(即“ SimplyShopping_Rediff”),其字段类型为text_ws。

<fieldType name="text_ws" class="solr.TextField"
    positionIncrementGap="100">
    <analyzer type="index">
        <tokenizer class="solr.PatternTokenizerFactory" pattern=";" />
        <filter class="solr.LowerCaseFilterFactory" />
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.PatternTokenizerFactory" pattern=";" />
        <filter class="solr.LowerCaseFilterFactory" />
    </analyzer>
</fieldType>

Is there any way I can change the query analyzer without reindexing, so that I can search on Shopping or Rediff. 有什么方法可以更改查询分析器而无需重新编制索引,以便可以在Shopping或Rediff上进行搜索。

No, in your case not. 不,您的情况不是。

The fieldType you have defined employs the pattern ; 您定义的fieldType使用了模式; to tokenize your text. 标记您的文本。 Tokenizing text means splitting the stream of words into indexed tokens (aka terms or words). 标记文本意味着将单词流分成索引标记(又称术语或单词)。

Staying with your example SimplyShopping_Rediff there is no ; 按照您的示例SimplyShopping_Rediff ,不存在任何问题; within it, so the whole text is recognized as one token and will be taken as exactly that token into your index. 在其中,因此将整个文本识别为一个标记,并将标记与该标记完全一样地放入您的索引中。 You may uppercase, lowercase, stem or filter it, but you cannot split it any more. 您可以将其大写,小写,词干或过滤,但不能再拆分。

Even if you would change your fieldType so that it tokenizes the way you want it to at query time. 即使您要更改fieldType,以便它在查询时标记所需的方式。 The tokens within your index are still flawed, as they have already been tokenized the wrong way at index time. 索引中的令牌仍然存在缺陷,因为它们在索引时已经用错误的方式令牌化了。 As such the new correct tokenized terms of your searches will hit nothing in the index. 因此,您搜索的新的正确的标记化术语将不会对索引产生任何影响。 No hit in the index, no search result. 没有找到索引,没有搜索结果。

In the solr reference guide is a good section about Analyzers, Tokenizers and Filters . 在solr参考指南中,有一个很好的部分介绍了Analyzers,Tokenizers和Filters As it maybe very expensive to rebuild a whole index, I would recommend to read this first. 由于重建整个索引可能非常昂贵,因此建议您先阅读此内容。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM