简体   繁体   中英

Elasticsearch reindexing: How do you direct updates to the new index while it is being built?

I understand reindexing using an alias to avoid downtime, as described here: Is there a smarter way to reindex elasticsearch?

But one problem remains: Say the reindexing takes an hour, while the original DB keeps changing. I would need any updates to go to both indexes.

Is there any way to do that?

If not, I would prefer if updates went to the new index, while queries were still served from the old index. But at least in Tire, I haven't seen a way to use different indices for reading and writing. Can that be done?

You can't update two index at the same time from Elasticsearch. You can handle that on your side and 2 index requests to Elasticsearch.

That said, you can probably use alias here althought I'm pretty sure you can search on more than one index using Tire (but I don't know Tire)

You have an old index1

Push all your content to index2 Add an alias index on top of index1, index2

When indexing is finished remove index1

To allow for zero-downtime index changes even as the search system is being updated with new user-generated content you can use the following strategy:

Define aliases for both read and write actions that will point to an ES index. When a Model is updated, look up the model_write alias and use it to write to all tracked indices, which will include both the currently active ones and any that are being built in the background.

class User < ActiveRecord::Base
  def self.index_for_search(user_id)
    Timeout::timeout(5) do
      user = User.find_by_id(user_id)
      write_alias = Tire::Alias.find("users_write")
      if write_alias
        write_alias.indices.each do |index_name|
          index = Tire::Index.new(index_name)
          if user
            index.store user
            index.remove 'user', user_id
        raise "Cannot index without existence of 'users_write' alias."

Now, when you want to do a full index rebuild (or initial index creation), add a new index, add it to the alias, and start building it knowing that any active users will be adding their data to both indices simultaneously. Continue to read from the old index until the new one is built, then switch the read alias.

class SearchHelper
  def self.set_alias_to_index(alias_name, index_name, clear_aliases = true)
    tire_alias = Tire::Alias.find(alias_name)
    if tire_alias
      tire_alias.indices.clear if clear_aliases
      tire_alias.indices.add index_name
      tire_alias = Tire::Alias.new(:name => alias_name)
      tire_alias.index index_name


def self.reindex_users_index(options = {})
  finished = false
  read_alias_name = "users"
  write_alias_name = "users_write"
  new_index_name = "#{read_alias_name}_#{Time.now.to_i}"

  # Make new index for re-indexing.
  index = Tire::Index.new(new_index_name)
  index.create :settings => analyzer_configuration,
               :mappings => { :user => user_mapping }

  # Add the new index to the write alias so that any system changes while we're re-indexing will be reflected.
  SearchHelper.set_alias_to_index(write_alias_name, new_index_name, false)

  # Reindex all users.
  User.find_in_batches do |batch|
    index.import batch.map { |m| m.to_elasticsearch_json }
  finished = true

  # Update the read and write aliases to only point at the newly re-indexed data.
  SearchHelper.set_alias_to_index read_alias_name, new_index_name
  SearchHelper.set_alias_to_index write_alias_name, new_index_name
  index.delete if defined?(index) && !finished

A post describing this strategy can be found here: http://www.mavengineering.com/blog/2014/02/12/seamless-elasticsearch-reindexing/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM