简体   繁体   English

带空间的Apache Solr Facet搜索

[英]Apache Solr Facet Search with Space

I am new to Solr Facet Search. 我是Solr Facet Search的新手。 I am searching some data using Apache Solr search, I had used Facet for some column to get the count but if there is a space or special character in that field it has been taken into count separately. 我正在使用Apache Solr搜索来搜索某些数据,我曾使用Facet来获取某列的计数,但是如果该字段中有空格或特殊字符,则会将其单独计入计数。 I had used the solution in this link Apache Solr facet search exclude space to avoid space but still my problem persists 我在此链接中使用了解决方案Apache Solr构面搜索排除了空间以避免空间,但问题仍然存在

My altered Schema.XML file after seeing the above link is 看到上面的链接后,我修改后的Schema.XML文件是

 <schema name="solr_quickstart" version="1.1">
 <types>
  <fieldType name="string" class="solr.StrField"/>
  <fieldType name="text" class="solr.TextField">
    <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
  </fieldType>
 <fieldType name="text_not_tokenized" class="solr.TextField">
  <analyzer>
    <tokenizer class="solr.KeywordTokenizerFactory"/>
  </analyzer>
</fieldType>
  <fieldType name="int" class="solr.TrieIntField"/>
 <fieldType name="UUIDField" class="solr.UUIDField"/>
 </types>
 <fields>
<field name="id" type="UUIDField" indexed="true"  stored="true"/>
<field name="caseid" type="int" indexed="true"  stored="true"/>
<field name="casenumber" type="text" indexed="true"  stored="true"/>
<field name="casestatus" type="text" indexed="true"  stored="true"/>
<field name="casetype" type="text" indexed="true"  stored="true"/>
<field name="closeddate" type="text" indexed="true"  stored="true"/>
<field name="courtname" type="text" indexed="true"  stored="true"/>
<field name="courtabbr" type="text" indexed="true"  stored="true"/>
<field name="fileddate" type="text" indexed="true"  stored="true"/>
<field name="judgename" type="text" indexed="true"  stored="true"/>
<field name="lastupdated" type="text" indexed="true"  stored="true"/>
<field name="maindefendant" type="text" indexed="true"  stored="true"/>
<field name="mainplaintiff" type="string" indexed="true"  stored="true"/>
<field name="all" type="string" docValues="true" indexed="true" stored="false" multiValued="true"/>


 </fields>

<defaultSearchField>casenumber</defaultSearchField>
<uniqueKey>id</uniqueKey>
<copyField source="casenumber" dest="all"/>
<copyField source="casestatus" dest="all"/>
<copyField source="casetype" dest="all"/>
<copyField source="courtname" dest="all"/>
<copyField source="courtabbr" dest="all"/>
<copyField source="judgename" dest="all"/>
<copyField source="maindefendant" dest="all"/>
<copyField source="mainplaintiff" dest="all"/>

</schema>

kindly anyone guide me in the right way of configuring my Schema.XML file 友善的人会以正确的方式指导我配置Schema.XML文件

Your problem is the tokenizer. 您的问题是令牌生成器。

This splits the field-value into different terms and every term get it's own count in facet queries. 这会将字段值分为不同的术语,每个术语在构面查询中都有其自己的计数。 To avoid this, you could remove the tokenizer (ore use an other tokenizer). 为避免这种情况,您可以删除令牌生成器(或使用其他令牌生成器)。 The result will be, that the whole field will be one term. 结果是,整个字段将是一个术语。 This is a problem, if you have mar than one "subject" in your textfield. 如果您的文本字段中只有一个“主题”,则这是一个问题。

I had an equal problem and tried to use the protected words, wich will not be applied on the tokenizer. 我有一个同样的问题,并尝试使用受保护的单词,但是这将不会应用于令牌生成器。 It's more (only?) for stemming: solr not tokenizing protected words 词干更多(仅?): solr不标记受保护的单词

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM