简体   繁体   中英

Tune solr phrase query search

We are trying to tune our phrase queries in DSE search. For example, if we have column name X with the value "DATASTAX" we are searching for exact match for X:"TAST"

Words are tokenized with with whitespacetokenizer.

We have couple hundred Million records in database and all the indexes are memory (We tested using pcstat). However still the queries are taking 5-15 sec. Why it is taking so time to pull the results if all the indexes are in memory? How can I tune this?

Any help is appreciated.

在此输入图像描述

Try this fieldType:

<fieldType name="custom_edge_ngram" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.PatternReplaceFilterFactory" pattern="([^A-Za-z0-9])" replacement="" replace="all"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.NGramFilterFactory" minGramSize="2" maxGramSize="15"/>
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.PatternReplaceFilterFactory" pattern="([^A-Za-z0-9])" replacement="" replace="all"/>
        <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
</fieldType>

Here the KeywordTokenizerFactory tokenizeer will pass the text stream exactly to the filters. The PatternReplaceFilterFactory will remove all except characters and numbers. You can config this however you want. Then we lowercase the stream and generate the NGram. This is for the index phase. For the query phase we don't do the NGram because we want to match the exact sub string.

We will be use the NGram instead of EdgeNGram, Because that will provides substring. The EdgeNGram always contain either from start or end. So EdgeNGram is not helpful in this case.

Hope this helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM