简体   繁体   English

在Solr中搜索

[英]Searching in Solr

I am building an ecommerce project where I am using solr search engine.I want to search based on specific keyword. 我正在建立一个使用solr搜索引擎的电子商务项目。我想根据特定的关键字进行搜索。 If I enter "c1234" , it should display all the documents having keyword "c1234". 如果输入“ c1234”,它将显示所有带有关键字“ c1234”的文档。 Its working fine. 它的工作正常。 But, if I enter "c12#34" then also it should consider "c1234" only. 但是,如果我输入“ c12#34”,那么它也应该只考虑“ c1234”。 So the problem is I want to ignore the hash tag here. 所以问题是我想在这里忽略哈希标签。 Solr should not consider my hash tag and it should display the same result for both the cases. Solr不应该考虑我的哈希标签,并且在两种情况下都应显示相同的结果。

The other problem is I want to trim whitespaces. 另一个问题是我想修剪空格。 If I search "HP 940", it should trim the whitespace and should display the similar result as "HP940". 如果我搜索“ HP 940”,它将修剪空白并显示与“ HP940”相似的结果。 So I want to have similar reults to be displayed with or without the whitespace. 所以我想在有或没有空格的情况下显示相似的结果。 For example, if I enter "Hp 940", solr should consider it as "HP940". 例如,如果我输入“ Hp 940”,solr应该将其视为“ HP940”。 So the problem is triming the white spaces 所以问题是修剪空白

Thanks in Advance 提前致谢

Try to use olr.WordDelimiterFilterFactory 尝试使用olr.WordDelimiterFilterFactory

Test case: 测试用例: 在此处输入图片说明

 <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" multiValued="true">
    <analyzer type="index">
      <tokenizer class="solr.WhitespaceTokenizerFactory" />
      <filter class="solr.WordDelimiterFilterFactory"
            generateWordParts="1" catenateWords="1"
            generateNumberParts="1" catenateNumbers="0" splitOnNumerics="1"
            catenateAll="0" splitOnCaseChange="1"
            stemEnglishPossessive="1" preserveOriginal="1" />
       <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
    <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
      <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
  </fieldType>

To replace # you should use https://cwiki.apache.org/confluence/display/solr/CharFilterFactories 要替换#,您应该使用https://cwiki.apache.org/confluence/display/solr/CharFilterFactories

For the hashtag and other characters you should take a look at the solr.WordDelimiterFilterFactory for this with the catenateWords parameter or alternatively the solr.PatternReplaceCharFilterFactory. 对于主题标签和其他字符,您应该使用catenateWords参数或solr.PatternReplaceCharFilterFactory来查看solr.WordDelimiterFilterFactory。

For words like HP 940 also consider something like phrase fields on the dismax handler with no slop. 对于像HP 940这样的词,还应考虑在dismax处理程序上没有词条之类的词组字段。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM