简体   繁体   中英

Solr query/field analyzer

I am total beginner with Solr and have a problem with unwanted characters getting into query results. For example when I search for "foo bar" I got content with "'foo' bar" etc. I just want to have exact matches. As far as I know this can be set up in schema.xml file. My content field type:

<fieldtype name="textNoStem" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
        <filter class="solr.LowerCaseFilterFactory" />
        <tokenizer class="solr.KeywordTokenizerFactory" />
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory" />
        <filter class="solr.LowerCaseFilterFactory" />
    </analyzer>
</fieldtype>

Please let me know if you know the solution. Kind Regards.

For both analyzers, the first line should be the tokenizer. The tokenizer is used to split the text into smaller units (words, most of the time). For your need, the WhitespaceTokenizerFactory is probably the right choice.

If you want absolute exact match, you do not need any filter after the tokenizer. But if you do no want searches to be case sensitive, you need to add a LowerCaseFilterFactory .

Notice that you have two analyzers: one of type 'index' and the other of type 'query'. As the names implied, the first one is used when indexing content while the other is used when you do queries. A rule that is almost always good is to have the same set of tokenizers/filters for both analyzers.

如果您只想精确匹配,请在查询时使用 KeywordTokenizerFactory 而不是 StandardTokenizerFactory。

I guess you dont get any results because the tokening is done differently on the data that is already indexed. As Pascal said, whitespaceTokenizer is the right choice in your case. Use it at both index and query time and check the results after indexing some data, not on the previously indexed data.

I suggest using analysis page to see the results with out actually indexing.Its quite useful.Make changes in schema, refresh the core, go to analysis page and look at verbose output to get the step by step analysis.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM