简体   繁体   English

Apache Solr搜索输入的文本的不同组合。

[英]Apache Solr search against different combinations of text entered.

I am new to this Apache Solr. 我是这个Apache Solr的新手。 I want to do a search against different combinations of text entered. 我想对输入的文本的不同组合进行搜索。 For example, if the text is 'hello' , it should return records having hello,llo,hel,ollhe, he and so on..Is this possible with solr ? 例如,如果文本为'hello',则应返回具有hello,llo,hel,ollhe,he等的记录。.solr可以吗? if so, how we can do this? 如果是这样,我们该怎么做? Please help me. 请帮我。

This is possible in solr. 在solr中这是可能的。 You can use the EdgeNGramFilterFactory in your fieldType. 您可以在EdgeNGramFilterFactory中使用EdgeNGramFilterFactory。 here is the example of it. 这是它的示例。

Here word hello will have tokens like he, hel, hell and hello 在这里,单词hello将具有他,hel,hell和hello之类的标记

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15"/>
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.KeywordTokenizerFactory"/>
    </analyzer>
</fieldType>

or you can try NGramTokenizerFactory instead of EdgeNGramFilterFactory . 或者您可以尝试使用NGramTokenizerFactory而不是EdgeNGramFilterFactory

<tokenizer class="solr.NGramTokenizerFactory" minGramSize="2" maxGramSize="10"/>

Which will give the output like 这将给输出像

for hello it would generate token like 你好,它将生成令牌

he, hel, hell, hello, el, ell, and so.. 他,hel,地狱,你好,el,ell等。

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<schema name="autoSolrSchema" version="1.5">
<types>
<fieldType class="org.apache.solr.schema.StrField" name="StrField"/>
<fieldType name="StrTokenizer" class="solr.TextField">
<analyzer>
<tokenizer class="solr.NGramTokenizerFactory" minGramSize="2"     maxGramSize="5"/>
</analyzer>
</fieldType>
<fieldType class="org.apache.solr.schema.TrieFloatField"  name="TrieFloatField"/>
<fieldType class="org.apache.solr.schema.TextField" name="TextField">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
<fieldType name="user_id" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
    <tokenizer class="solr.KeywordTokenizerFactory"/>
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15"/>
</analyzer>
<analyzer type="query">
    <tokenizer class="solr.KeywordTokenizerFactory"/>
</analyzer>
</fieldType>
</types>
<fields>
<field indexed="true" multiValued="false" name="user_id" stored="true" type="StrField"/>
<field indexed="true" multiValued="false" name="company" stored="true" type="StrField"/>
<field indexed="true" multiValued="false" name="tins" stored="true" type="TrieFloatField"/>
<field indexed="true" multiValued="false" name="user_standard" stored="true" type="StrTokenizer"/>
<field indexed="true" multiValued="false" name="requests" stored="true" type="TrieFloatField"/>
<field indexed="true" multiValued="false" name="include" stored="true" type="TextField"/>
</fields>
<uniqueKey>(user_id,company,user_standard)</uniqueKey>
</schema>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM