在Solr中搜索

Question

I am building an ecommerce project where I am using solr search engine.I want to search based on specific keyword. 我正在建立一个使用solr搜索引擎的电子商务项目。我想根据特定的关键字进行搜索。 If I enter "c1234" , it should display all the documents having keyword "c1234". 如果输入“ c1234”，它将显示所有带有关键字“ c1234”的文档。 Its working fine. 它的工作正常。 But, if I enter "c12#34" then also it should consider "c1234" only. 但是，如果我输入“ c12＃34”，那么它也应该只考虑“ c1234”。 So the problem is I want to ignore the hash tag here. 所以问题是我想在这里忽略哈希标签。 Solr should not consider my hash tag and it should display the same result for both the cases. Solr不应该考虑我的哈希标签，并且在两种情况下都应显示相同的结果。

The other problem is I want to trim whitespaces. 另一个问题是我想修剪空格。 If I search "HP 940", it should trim the whitespace and should display the similar result as "HP940". 如果我搜索“ HP 940”，它将修剪空白并显示与“ HP940”相似的结果。 So I want to have similar reults to be displayed with or without the whitespace. 所以我想在有或没有空格的情况下显示相似的结果。 For example, if I enter "Hp 940", solr should consider it as "HP940". 例如，如果我输入“ Hp 940”，solr应该将其视为“ HP940”。 So the problem is triming the white spaces 所以问题是修剪空白

Thanks in Advance 提前致谢

Answer 1

Try to use olr.WordDelimiterFilterFactory 尝试使用olr.WordDelimiterFilterFactory

Test case: 测试用例：

 <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" multiValued="true">
    <analyzer type="index">
      <tokenizer class="solr.WhitespaceTokenizerFactory" />
      <filter class="solr.WordDelimiterFilterFactory"
            generateWordParts="1" catenateWords="1"
            generateNumberParts="1" catenateNumbers="0" splitOnNumerics="1"
            catenateAll="0" splitOnCaseChange="1"
            stemEnglishPossessive="1" preserveOriginal="1" />
       <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
    <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
      <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
  </fieldType>

To replace # you should use https://cwiki.apache.org/confluence/display/solr/CharFilterFactories 要替换＃，您应该使用https://cwiki.apache.org/confluence/display/solr/CharFilterFactories

Answer 2

For the hashtag and other characters you should take a look at the solr.WordDelimiterFilterFactory for this with the catenateWords parameter or alternatively the solr.PatternReplaceCharFilterFactory. 对于主题标签和其他字符，您应该使用catenateWords参数或solr.PatternReplaceCharFilterFactory来查看solr.WordDelimiterFilterFactory。

For words like HP 940 also consider something like phrase fields on the dismax handler with no slop. 对于像HP 940这样的词，还应考虑在dismax处理程序上没有词条之类的词组字段。

在Solr中搜索

问题描述

2 个解决方案

解决方案1
1 已采纳 2016-06-14 14:52:06

解决方案2
0 2016-06-13 22:03:30

在Solr中搜索

问题描述

2 个解决方案

解决方案1 1 已采纳 2016-06-14 14:52:06

解决方案2 0 2016-06-13 22:03:30

解决方案1
1 已采纳 2016-06-14 14:52:06

解决方案2
0 2016-06-13 22:03:30