如何在SOLR中创建字符串字段的不区分大小写的副本？

Question

How can I create a copy of a string field in case insensitive form? 如何以不区分大小写的形式创建字符串字段的副本？ I want to use the typical "string" type and a case insensitive type. 我想使用典型的“字符串”类型和不区分大小写的类型。 The types are defined like so: 类型的定义如下：

    <fieldType name="string" class="solr.StrField"
        sortMissingLast="true" omitNorms="true" />

    <!-- A Case insensitive version of string type  -->
    <fieldType name="string_ci" class="solr.StrField"
        sortMissingLast="true" omitNorms="true">
        <analyzer type="index">
            <tokenizer class="solr.KeywordTokenizerFactory"/>           
            <filter class="solr.LowerCaseFilterFactory" />
        </analyzer>
        <analyzer type="query">
            <tokenizer class="solr.KeywordTokenizerFactory"/>
            <filter class="solr.LowerCaseFilterFactory" />
        </analyzer>
    </fieldType>

And an example of the field like so: 像这样的领域的一个例子：

<field name="destANYStr" type="string" indexed="true" stored="true"
    multiValued="true" />
<!-- Case insensitive version -->
<field name="destANYStrCI" type="string_ci" indexed="true" stored="false" 
    multiValued="true" />

I tried using CopyField like so: 我试过像这样使用CopyField：

<copyField source="destANYStr" dest="destANYStrCI" />

But, apparently CopyField is called on source and dest before any analyzers are invoked, so even though I've specified that dest is case-insensitive through anaylyzers the case of the values copied from source field are preserved. 但是，显然在调用任何分析器之前会在源和目标上调用CopyField，所以即使我通过分析器指定了dest不区分大小写，也会保留从源字段复制的值的大小写。

I'm hoping to avoid re-transmitting the value in the field from the client, at record creation time. 我希望在创建记录时避免从客户端重新传输字段中的值。

Answer 1

With no answers from SO, I followed up on the SOLR users list. 由于没有来自SO的答案，我跟进了SOLR用户列表。 I found that my string_ci field was not working as expected before even considering the effects of copyField. 我发现即使考虑到copyField的影响，我的string_ci字段也没有按预期工作。 Ahmet Arslan explains why the "string_ci" field should be using solr.TextField and not solr.StrField: Ahmet Arslan解释了为什么“string_ci”字段应该使用solr.TextField而不是solr.StrField：

From apache-solr-1.4.0\\example\\solr\\conf\\schema.xml : 从apache-solr-1.4.0 \\ example \\ solr \\ conf \\ schema.xml：

"The StrField type is not analyzed, but indexed/stored verbatim." “不分析StrField类型，而是逐字索引/存储。”

"solr.TextField allows the specification of custom text analyzers specified as a tokenizer and a list of token filters." “solr.TextField允许指定自定义文本分析器，指定为标记器和令牌过滤器列表。”

With an example he provdied and a slight tweak by myself, the following field definition seems to do the trick, and now the CopyField works as expected as well. 通过他自己提供的示例和轻微的调整，以下字段定义似乎可以解决问题，现在CopyField也按预期工作。

    <fieldType name="string_ci" class="solr.TextField"
        sortMissingLast="true" omitNorms="true">
        <analyzer>
            <tokenizer class="solr.KeywordTokenizerFactory"/>           
            <filter class="solr.LowerCaseFilterFactory" />
        </analyzer>
    </fieldType>

The destANYStrCI field will have a case preserved value stored but will provide a case insensitive field to search on. destANYStrCI字段将保存一个案例保留值，但将提供不区分大小写的字段以进行搜索。 CAVEAT: case insensitive wildcard searching cannot be done since wild card phrases bypass the query analyzer and will not be lowercased before matching against the index. CAVEAT：不能进行不区分大小写的通配符搜索，因为通配符短语绕过查询分析器，并且在与索引匹配之前不会小写。 This means that the characters in wildcard phrases must be lowercase in order to match. 这意味着通配符短语中的字符必须小写才能匹配。

Answer 2

Yes true. 没错。 LowerCaseFilterFactory does not applies to String data type. LowerCaseFilterFactory不适用于String数据类型。 We could only apply LowerCaseFilterFactory on Text fields. 我们只能在Text字段上应用LowerCaseFilterFactory。

If you try to do this way 如果你试着这样做

<!-- Assigning customised data type -->
<field name="language" type="text_lower" indexed="true" stored="true"  multiValued="false" default="en"/>  

<!-- Defining customised data type for lower casing. -->
<fieldType name="text_lower" class="solr.String" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

It would not work, We have to use TextField. 它不起作用，我们必须使用TextField。

Try this way, it should work. 尝试这种方式，它应该工作。 Just change the fieldType from String to TextField 只需将fieldType从String更改为TextField

 <!-- Assigning customised data type --> <field name="language" type="text_lower" indexed="true" stored="true" multiValued="false" default="en"/> <!-- Defining customised data type for lower casing. --> <fieldType name="text_lower" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType>

如何在SOLR中创建字符串字段的不区分大小写的副本？

问题描述

2 个解决方案

解决方案1
53 已采纳 2010-01-13 23:00:14

解决方案2
7 2013-10-21 02:22:01

如何在SOLR中创建字符串字段的不区分大小写的副本？

问题描述

2 个解决方案

解决方案1 53 已采纳 2010-01-13 23:00:14

解决方案2 7 2013-10-21 02:22:01

解决方案1
53 已采纳 2010-01-13 23:00:14

解决方案2
7 2013-10-21 02:22:01