简体   繁体   中英

Score by closest match in Elasticsearch

I have a Elasticsearch::Model on a ActiveRecord::Base model that looks like this

class ArtistGroup < ActiveRecord::Base
  include Elasticsearch::Model
  include Elasticsearch::Model::Callbacks

  FT_REDIS_KEY = "agft"
  has_many :artists

  settings index: { number_of_shards: 5 } do
    mappings dynamic: 'false' do
      indexes :normalized_name, analyzer: 'english'
      indexes :name, analyzer: 'english'
    end
  end

  def as_indexed_json(options={})
    as_json(only: ['normalized_name', 'id', 'name'])
  end
....

When I search by .search('haim') I want the document with name: "Haim" to be returned first before others like "Danielle Haim of Haim", how can I control ES querying to score by closest match?

Elasticsearch returns by default the results sorted by relevance (ie the score of each document).

The way that this score is calculated is based on a set of basic rules combined with some query-specific rules.

The standard similarity algorithm used in Elasticsearch is known as term frequency/inverse document frequency, or TF/IDF, which takes the following factors into account:

  • Term frequency: How often does the term appear in the field? The more often, the more relevant. A field containing five mentions of the same term is more likely to be relevant than a field containing just one mention.
  • Inverse document frequency: How often does each term appear in the index? The more often, the less relevant. Terms that appear in many documents have a lower weight than more-uncommon terms.
  • Field-length norm: How long is the field? The longer it is, the less likely it is that words in the field will be relevant. A term appearing in a short title field carries more weight than the same term appearing in a long content field.

Individual queries may combine the TF/IDF score with other factors such as the term proximity in phrase queries, or term similarity in fuzzy queries.

For a complete description of relevance please refer here: http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/sorting.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM