简体   繁体   中英

Why does this elasticsearch/tire code not match partial words?

I'm trying to use Elasticsearch and Tire to index some data. I want to be able to search it on partial matches, not just full words. When running a query on the example model below, it will only match words in the "notes" field that are full word matches. I can't figure out why.

class Thingy
  include Tire::Model::Search
  include Tire::Model::Callbacks

  # has some attributes

  tire do
    settings analysis: {
      filter: {
        ngram_filter: {
          type: 'nGram',
          min_gram: 2,
          max_gram: 12
        }
      },
      analyzer: {
        index_ngram_analyzer: {
          type: 'custom',
          tokenizer: 'standard',
          filter: ['lowercase']
        },
        search_ngram_analyzer: {
          type: 'custom',
          tokenizer: 'standard',
          filter: ['lowercase', 'ngram_filter']
        }
      }
    } do
      mapping do
        indexes :notes, :type => "string", boost: 10, index_analyzer: "index_ngram_analyzer", search_analyzer: "search_ngram_analyzer"
      end
    end
  end

  def to_indexed_json
    {
      id:          self.id,
      account_id:  self.account_id,
      created_at:  self.created_at,
      test:        self.test,
      notes:       some_method_that_returns_string
    }.to_json
  end
end

The query looks like this:

@things = Thing.search page: params[:page], per_page: 50 do
  query {
    boolean {
      must     { string "account_id:#{account_id}" }
      must_not { string "test:true"                }
      must     { string "#{query}"                 }
    }
  }
  sort {
    by :id, 'desc'
  }
  size 50
  highlight notes: {number_of_fragments: 0}, options: {tag: '<span class="match">'}
end

I've also tried this but it never returns results (and ideally I'd like the search to apply to all fields, not just notes):

must { match :notes, "#{query}" } # tried with `type: :phrase` as well

What am I doing wrong?

You almost got there! :) The problem is that you've swapped the role of index_analyzer and search_analyzer , in fact.

Let me explain briefly how it works:

  1. You want to break document words into these ngram "chunks" during indexing , so when you are indexing a word like Martian , it get's broken into: ['ma', 'mar', 'mart', ..., 'ar', 'art', 'arti', ...] . You can try it with the Analyze API: http://localhost:9200/thingies/_analyze?text=Martian&analyzer=index_ngram_analyzer .

  2. When people are searching, they are already using these partial ngrams, so to speak, since they search for "mar" or "mart" etc. So you don't break their phrases further with the ngram tokenizer.

  3. That's why you (correctly) separate index_analyzer and search_analyzer in your mapping, so Elasticsearch knows how to analyze the notes attribute during indexing, and how to analyse any search phrase against this attribute.

In other words, do this:

analyzer: {
  index_ngram_analyzer: {
    type: 'custom',
    tokenizer: 'standard',
    filter: ['lowercase', 'ngram_filter']
  },
  search_ngram_analyzer: {
    type: 'custom',
    tokenizer: 'standard',
    filter: ['lowercase']
  }
}

Full, working Ruby code is below. Also, I highly recommend you to migrate to the new elasticsearch-model Rubygem, which contains all important features of Tire and is actively developed.


require 'tire'

Tire.index('thingies').delete

class Thingy
  include Tire::Model::Persistence

  tire do
    settings analysis: {
      filter: {
        ngram_filter: {
          type: 'nGram',
          min_gram: 2,
          max_gram: 12
        }
      },
      analyzer: {
        index_ngram_analyzer: {
          type: 'custom',
          tokenizer: 'standard',
          filter: ['lowercase', 'ngram_filter']
        },
        search_ngram_analyzer: {
          type: 'custom',
          tokenizer: 'standard',
          filter: ['lowercase']
        }
      }
    } do
      mapping do
        indexes :notes, type: "string", index_analyzer: "index_ngram_analyzer", search_analyzer: "search_ngram_analyzer"
      end
    end
  end

  property :notes
end

Thingy.create id: 1, notes: 'Martial Partial Martian'
Thingy.create id: 2, notes: 'Venetian Completion Heresion'
Thingy.index.refresh

# Find 'art' in 'martial'
#
# Equivalent to: http://localhost:9200/thingies/_search?q=notes:art
#
results = Thingy.search do
  query do
    match :notes, 'art'
  end
end

p results.map(&:notes)

# Find 'net' in 'venetian'
#
# Equivalent to: http://localhost:9200/thingies/_search?q=notes:net
#
results = Thingy.search do
  query do
    match :notes, 'net'
  end
end

p results.map(&:notes)

The problem for me was that I was using the string query instead of the match query. The search should have been written like this:

@things = Thing.search page: params[:page], per_page: 50 do
  query {
    match [:prop_1, prop_2, :notes], query
  }
  sort {
    by :id, 'desc'
  }
  filter :term, account_id: account_id
  filter :term, test: false
  size 50
  highlight notes: {number_of_fragments: 0}, options: {tag: '<span class="match">'}
end

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM