简体   繁体   中英

ElasticSearch/Tire: How to properly set partial word searches up

Even though I've seen many accounts of it mentioning this as relatively straightforward, I haven't managed to see it working properly. Let's say I have this:

class Car < ActiveRecord::Base
  settings analysis: {
    filter: {
      ngram_filter: { type: "nGram", min_gram: 3, max_gram: 12 }
    },
    analyzer: {
      partial_analyzer: {
        type: "snowball",
        tokenizer: "standard",
        filter: ["standard", "lowercase", "ngram_filter"]
      }
    }
  } do
    mapping do
      indexes :name,                    index_analyzer: "partial_analyzer"
    end
  end
end

And let's say I have a car named "Ford" and I update my index. Now, if I search for "Ford":

Car.tire.search { query { string "Ford" } }

My car is in my results. Now, If I look for "For":

Car.tire.search { query { string "For" } }

My car isn't found anymore. I thought the nGram filter would automatically take care of it for me, but apparently it isn't. As a temporary solution I'm using the wildcard (*) for such searches, but this is definitely not the best approach, being the min_gram and max_gram definitions key elements in my search. Can anyone tell me how they solved this?

I'm using Rails 3.2.12 with ruby 1.9.3 . ElasticSearch version is 0.20.5.

You want to use the custom analyzer instead of the snowball one: Elasticsearch custom analyzer

Basically the other analyzers come with a predefined set of filters and tokenizers.

You probably also want to use the Edge-Ngram filter: Edge-Ngram filter

The difference between Edge-NGram and NGram is basically Edge-Ngram basically only sticking to the "edges" of a term. So it starts at the front or at the back. Ford -> [For] instead of -> [For, ord]

Some more advanced links on the topic of autocompletion:

Autocompletion with fuzziness (pure elasticsearch, no tire, but very good read)

Another useful question with links provided

Edit

Basically I have a very similar setup to what you have. But with another analyzer for title and multi-field for both. And because of multi-language support here is an array of names instead of just a name.

I also specify the search_analyzer and I use string-keys instead of symbols. This is what I actually have:

settings "analysis" => {
    "filter" => {
        "name_ngrams"  => {
            "side"     => "front",
            "max_gram" => 20,
            "min_gram" => 2,
            "type"     => "edgeNGram"
        }
    },
    "analyzer" => {
        "full_name"     => {
            "filter"    => %w(standard lowercase asciifolding),
            "type"      => "custom",
            "tokenizer" => "letter"
        },
        "partial_name"        => {
            "filter"    => %w(standard lowercase asciifolding name_ngrams),
            "type"      => "custom",
            "tokenizer" => "standard"
        }
    }
} do
  mapping do
    indexes :names do
      mapping do
        indexes :name, :type => 'multi_field',
                :fields => {
                    "partial"           => {
                        "search_analyzer" => "full_name",
                        "index_analyzer"  => "partial_name",
                        "type"            => "string"
                    },
                    "title"      => {
                        "type"     => "string",
                        "analyzer" => "full_name"
                    }
                }
      end
    end
  end
end

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM