简体   繁体   English

Apache Solr搜索问题

[英]Apache solr search issue

i've got a search issue with apachesolr. 我在apachesolr中遇到搜索问题。

For example The contents that i've indexed are: 例如,我索引的内容是:

  • Tiramisu d'hiver 提拉米苏·迪弗
  • Velouté d'hiver 海风之歌
  • Minestrone d'hiver crémeux, 蔬菜通心粉汤,
  • Smoothie version hiver 思慕雪版蜂巢

when i search "hiver", i get only Smoothie version hiver as results. 当我搜索“ hiver”时,只得到思慕雪版本hiver作为结果。

When i search dhiver, i get as results 搜索干网时,我得到的结果

  • Tiramisu d'hiver 提拉米苏·迪弗
  • Velouté d'hiver 海风之歌
  • Minestrone d'hiver crémeux 百香果通心粉

I need to get all results whether i search hiver or dhiver or dhiver 无论是搜索hiver还是dhiver或dhiver,我都需要获取所有结果

Any one have an idea what is the problem? 有人知道这是什么问题吗? Do i have to change something in my schema.xml ? 我是否需要更改schema.xml中的某些内容?

My schema for textfield is : 我的文本字段架构为:

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.StopFilterFactory"
            ignoreCase="true"
            words="stopwords.txt"
            />
    <filter class="solr.WordDelimiterFilterFactory" 
          generateWordParts="1" 
          generateNumberParts="1"
          catenateWords="1"
          catenateNumbers="1"
          catenateAll="0"
          splitOnCaseChange="1"
          splitOnNumerics="1"
          preserveOriginal="1"
    />
    <filter class="solr.LengthFilterFactory" min="3" max="100" />
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.ShingleFilterFactory" minShingleSize="2" maxShingleSize="5"/>
<filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="true"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>

  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory"
            ignoreCase="true"
            words="stopwords.txt"
            />
    <filter class="solr.WordDelimiterFilterFactory" 
          generateWordParts="1" 
          generateNumberParts="1"
          catenateWords="1"
          catenateNumbers="0"
          catenateAll="0"
          splitOnCaseChange="1"
          splitOnNumerics="1"
    />
    <filter class="solr.LengthFilterFactory" min="3" max="100" />
    <filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="true"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>

  </analyzer>

  <analyzer type="multiterm">
    <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.StopFilterFactory"
            ignoreCase="true"
            words="stopwords.txt"
            />
    <filter class="solr.WordDelimiterFilterFactory"
            protected="protwords.txt"
            generateWordParts="1"
            generateNumberParts="1"
            catenateWords="0"
            catenateNumbers="0"
            catenateAll="0"
            splitOnCaseChange="1"
            preserveOriginal="1"/>
    <filter class="solr.LengthFilterFactory" min="2" max="100" />
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
</fieldType>

Hmmm tasty. 嗯好吃。

First point, for all these kind of problems use the Solr Analysis tool is your friend. 首先,对于所有这类问题,请使用Solr Analysis工具。 Second, remember that Solr only matches if the query and terms are 100% character for character identical. 其次,请记住,Solr仅在查询和术语为100%字符时才匹配。

For the following filter 对于以下过滤器

<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" preserveOriginal="1" />

Velouté d'hiver will be analyzed as Veloutéd'hiver将被分析为

veloute | 天鹅绒| d'hiver | d'hiver | d | d | dhiver | 细纱| hiver ver

So will match your query for hiver - you may want to remove the | 因此,将与您的查询相匹配,以查找-您可能需要删除|。 d | d | token that my filter generated. 我的过滤器生成的令牌。

Remember to fold accent characters too somewhere. 切记将重音符号也折叠在某个地方。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM