简体   繁体   中英

How to create a case insensitive copy of a string field in SOLR?

How can I create a copy of a string field in case insensitive form? I want to use the typical "string" type and a case insensitive type. The types are defined like so:

    <fieldType name="string" class="solr.StrField"
        sortMissingLast="true" omitNorms="true" />

    <!-- A Case insensitive version of string type  -->
    <fieldType name="string_ci" class="solr.StrField"
        sortMissingLast="true" omitNorms="true">
        <analyzer type="index">
            <tokenizer class="solr.KeywordTokenizerFactory"/>           
            <filter class="solr.LowerCaseFilterFactory" />
        </analyzer>
        <analyzer type="query">
            <tokenizer class="solr.KeywordTokenizerFactory"/>
            <filter class="solr.LowerCaseFilterFactory" />
        </analyzer>
    </fieldType> 

And an example of the field like so:

<field name="destANYStr" type="string" indexed="true" stored="true"
    multiValued="true" />
<!-- Case insensitive version -->
<field name="destANYStrCI" type="string_ci" indexed="true" stored="false" 
    multiValued="true" />

I tried using CopyField like so:

<copyField source="destANYStr" dest="destANYStrCI" />

But, apparently CopyField is called on source and dest before any analyzers are invoked, so even though I've specified that dest is case-insensitive through anaylyzers the case of the values copied from source field are preserved.

I'm hoping to avoid re-transmitting the value in the field from the client, at record creation time.

With no answers from SO, I followed up on the SOLR users list. I found that my string_ci field was not working as expected before even considering the effects of copyField. Ahmet Arslan explains why the "string_ci" field should be using solr.TextField and not solr.StrField:

From apache-solr-1.4.0\\example\\solr\\conf\\schema.xml :

"The StrField type is not analyzed, but indexed/stored verbatim."

"solr.TextField allows the specification of custom text analyzers specified as a tokenizer and a list of token filters."

With an example he provdied and a slight tweak by myself, the following field definition seems to do the trick, and now the CopyField works as expected as well.

    <fieldType name="string_ci" class="solr.TextField"
        sortMissingLast="true" omitNorms="true">
        <analyzer>
            <tokenizer class="solr.KeywordTokenizerFactory"/>           
            <filter class="solr.LowerCaseFilterFactory" />
        </analyzer>
    </fieldType> 

The destANYStrCI field will have a case preserved value stored but will provide a case insensitive field to search on. CAVEAT: case insensitive wildcard searching cannot be done since wild card phrases bypass the query analyzer and will not be lowercased before matching against the index. This means that the characters in wildcard phrases must be lowercase in order to match.

Yes true. LowerCaseFilterFactory does not applies to String data type. We could only apply LowerCaseFilterFactory on Text fields.

If you try to do this way

<!-- Assigning customised data type -->
<field name="language" type="text_lower" indexed="true" stored="true"  multiValued="false" default="en"/>  

<!-- Defining customised data type for lower casing. -->
<fieldType name="text_lower" class="solr.String" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

It would not work, We have to use TextField.

Try this way, it should work. Just change the fieldType from String to TextField

 <!-- Assigning customised data type --> <field name="language" type="text_lower" indexed="true" stored="true" multiValued="false" default="en"/> <!-- Defining customised data type for lower casing. --> <fieldType name="text_lower" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType> 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM