简体   繁体   English

如何在SOLR中创建字符串字段的不区分大小写的副本?

[英]How to create a case insensitive copy of a string field in SOLR?

How can I create a copy of a string field in case insensitive form? 如何以不区分大小写的形式创建字符串字段的副本? I want to use the typical "string" type and a case insensitive type. 我想使用典型的“字符串”类型和不区分大小写的类型。 The types are defined like so: 类型的定义如下:

    <fieldType name="string" class="solr.StrField"
        sortMissingLast="true" omitNorms="true" />

    <!-- A Case insensitive version of string type  -->
    <fieldType name="string_ci" class="solr.StrField"
        sortMissingLast="true" omitNorms="true">
        <analyzer type="index">
            <tokenizer class="solr.KeywordTokenizerFactory"/>           
            <filter class="solr.LowerCaseFilterFactory" />
        </analyzer>
        <analyzer type="query">
            <tokenizer class="solr.KeywordTokenizerFactory"/>
            <filter class="solr.LowerCaseFilterFactory" />
        </analyzer>
    </fieldType> 

And an example of the field like so: 像这样的领域的一个例子:

<field name="destANYStr" type="string" indexed="true" stored="true"
    multiValued="true" />
<!-- Case insensitive version -->
<field name="destANYStrCI" type="string_ci" indexed="true" stored="false" 
    multiValued="true" />

I tried using CopyField like so: 我试过像这样使用CopyField:

<copyField source="destANYStr" dest="destANYStrCI" />

But, apparently CopyField is called on source and dest before any analyzers are invoked, so even though I've specified that dest is case-insensitive through anaylyzers the case of the values copied from source field are preserved. 但是,显然在调用任何分析器之前会在源和目标上调用CopyField,所以即使我通过分析器指定了dest不区分大小写,也会保留从源字段复制的值的大小写。

I'm hoping to avoid re-transmitting the value in the field from the client, at record creation time. 我希望在创建记录时避免从客户端重新传输字段中的值。

With no answers from SO, I followed up on the SOLR users list. 由于没有来自SO的答案,我跟进了SOLR用户列表。 I found that my string_ci field was not working as expected before even considering the effects of copyField. 我发现即使考虑到copyField的影响,我的string_ci字段也没有按预期工作。 Ahmet Arslan explains why the "string_ci" field should be using solr.TextField and not solr.StrField: Ahmet Arslan解释了为什么“string_ci”字段应该使用solr.TextField而不是solr.StrField:

From apache-solr-1.4.0\\example\\solr\\conf\\schema.xml : 从apache-solr-1.4.0 \\ example \\ solr \\ conf \\ schema.xml:

"The StrField type is not analyzed, but indexed/stored verbatim." “不分析StrField类型,而是逐字索引/存储。”

"solr.TextField allows the specification of custom text analyzers specified as a tokenizer and a list of token filters." “solr.TextField允许指定自定义文本分析器,指定为标记器和令牌过滤器列表。”

With an example he provdied and a slight tweak by myself, the following field definition seems to do the trick, and now the CopyField works as expected as well. 通过他自己提供的示例和轻微的调整,以下字段定义似乎可以解决问题,现在CopyField也按预期工作。

    <fieldType name="string_ci" class="solr.TextField"
        sortMissingLast="true" omitNorms="true">
        <analyzer>
            <tokenizer class="solr.KeywordTokenizerFactory"/>           
            <filter class="solr.LowerCaseFilterFactory" />
        </analyzer>
    </fieldType> 

The destANYStrCI field will have a case preserved value stored but will provide a case insensitive field to search on. destANYStrCI字段将保存一个案例保留值,但将提供不区分大小写的字段以进行搜索。 CAVEAT: case insensitive wildcard searching cannot be done since wild card phrases bypass the query analyzer and will not be lowercased before matching against the index. CAVEAT:不能进行不区分大小写的通配符搜索,因为通配符短语绕过查询分析器,并且在与索引匹配之前不会小写。 This means that the characters in wildcard phrases must be lowercase in order to match. 这意味着通配符短语中的字符必须小写才能匹配。

Yes true. 没错。 LowerCaseFilterFactory does not applies to String data type. LowerCaseFilterFactory不适用于String数据类型。 We could only apply LowerCaseFilterFactory on Text fields. 我们只能在Text字段上应用LowerCaseFilterFactory。

If you try to do this way 如果你试着这样做

<!-- Assigning customised data type -->
<field name="language" type="text_lower" indexed="true" stored="true"  multiValued="false" default="en"/>  

<!-- Defining customised data type for lower casing. -->
<fieldType name="text_lower" class="solr.String" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

It would not work, We have to use TextField. 它不起作用,我们必须使用TextField。

Try this way, it should work. 尝试这种方式,它应该工作。 Just change the fieldType from String to TextField 只需将fieldType从String更改为TextField

 <!-- Assigning customised data type --> <field name="language" type="text_lower" indexed="true" stored="true" multiValued="false" default="en"/> <!-- Defining customised data type for lower casing. --> <fieldType name="text_lower" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType> 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM