简体   繁体   中英

solr query analyzer underscore

I have underscore separated and camel case values ( eq "SimplyShopping_Rediff") in the field of the document with fieldtype as text_ws.

<fieldType name="text_ws" class="solr.TextField"
    positionIncrementGap="100">
    <analyzer type="index">
        <tokenizer class="solr.PatternTokenizerFactory" pattern=";" />
        <filter class="solr.LowerCaseFilterFactory" />
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.PatternTokenizerFactory" pattern=";" />
        <filter class="solr.LowerCaseFilterFactory" />
    </analyzer>
</fieldType>

Is there any way I can change the query analyzer without reindexing, so that I can search on Shopping or Rediff.

No, in your case not.

The fieldType you have defined employs the pattern ; to tokenize your text. Tokenizing text means splitting the stream of words into indexed tokens (aka terms or words).

Staying with your example SimplyShopping_Rediff there is no ; within it, so the whole text is recognized as one token and will be taken as exactly that token into your index. You may uppercase, lowercase, stem or filter it, but you cannot split it any more.

Even if you would change your fieldType so that it tokenizes the way you want it to at query time. The tokens within your index are still flawed, as they have already been tokenized the wrong way at index time. As such the new correct tokenized terms of your searches will hit nothing in the index. No hit in the index, no search result.

In the solr reference guide is a good section about Analyzers, Tokenizers and Filters . As it maybe very expensive to rebuild a whole index, I would recommend to read this first.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM