简体   繁体   中英

Substring search with ONGR Elastic Bundle for Symfony

I'm using the https://github.com/ongr-io/ElasticsearchBundle for my Symfony3 project. The reason for this bundle is, that my project is using propel.

Till now everything is fine, it's working quite well. But now I want to add the possibility to search for a substring of a word. eg there are items named like Test01, Test02, Test03, ... and when I try to search for Test for example I don't get any results. Just when I type the whole word like Test01.

I've read about the possibility of wildcard searches, but different solutions said, that using ngram or edge_ngram would be a better solution.

I've tried to specify it in the configuration as follows

ongr_elasticsearch:
    analysis:
      filter:
        incremental_filter:
          type: edge_ngram
          min_gram: 3
          max_gram: 10
      analyzer:
        incrementalAnalyzer:
          type: custom
          tokenizer: standard
          filter:
              - lowercase
              - incremental_filter
    managers:
      default:
          index:
            hosts:
                - %elastic_host%:%elastic_port%
            index_name: index
            analysis:
              analyzer:
                  - incrementalAnalyzer
              filter:
                  - incremental_filter
          mappings:
              - AppBundle

But I didn't get the result as wanted. Can anyone help me with that? What are the differences between filters and analyzers? I'm using a MultiMatchQuery as I want to search in different fields of different types:

$multiMatchQuery =
 new MultiMatchQuery(
                [
                    'name^12',
                    'product_name^8',
                    'itemno^18',
                    'number^7',
                    'category^6',
                    'company^4',
                    'motor^3',
                    'chassis^13',
                    'engine^14',
                    'description'
                ],
                $term
            );
            $search->addQuery($multiMatchQuery);

I also tried to define "not_analyzed" fields.

Hope for your help!

Thanks.

Okay, I found the solution. Here is an article which describes the problem (specific in German language) https://www.elastic.co/guide/en/elasticsearch/guide/current/ngrams-compound-words.html

So the analyzer needs a ngram filter (didn't worked with tokenizer). I forgot also to specify the property with the analyzer. now it worked.

ongr_elasticsearch:
    analysis:
      analyzer:
        my_ngram_analyzer:
          type: custom
          tokenizer: standard
          filter:
            - lowercase
            - my_ngram_filter
      filter:
        my_ngram_filter:
          type: ngram
          min_gram: 2
          max_gram: 8
    managers:
      default:
          index:
            hosts:
                - %elastic_host%:%elastic_port%
            index_name: index
            analysis:
              analyzer:
                  - my_ngram_analyzer
              filter:
                  - my_ngram_filter
          mappings:
              - AppBundle

And a property in the Document need to be defined properly as well (for all needed properties).

    /**
     * @var string
     *
     * @ES\Property(name="itemno", type="string", options={"analyzer":"my_ngram_analyzer"})
     */
    public $itemno;

The bundle configuration represents elasticsearch mapping. In the analysis section, you can define analyzers where you can reuse them in separate managers.

The difference between filters and analyzers is that filters are used in analyzer chain. Analyzer contains a chain of actions where the filter is part of it the same way as tokenizers, token filters, and others. Here is very good article about analyzer https://www.elastic.co/blog/found-text-analysis-part-1

To make search work as you like I think you should use ngram tokenizer and not filter. https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-ngram-tokenizer.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM