简体   繁体   中英

Multiple model single index approach - elasticsearch via tire

In my multi-tenant app (account based with number of users per account), how would I update index for a particular account when a user document is changed.

Using Elasticsearch via Tire gem.

Rails 2.3 app - applied changes to enable support for Rails 2.3 as per loe/tire's commit

Account Model:

  include Tire::Model::Search

  Tire.index('account_1') do
    create(
      :mappings => {
        :user => {
          :properties => {
            :name => { :type => :string, :boost => 10 },
            :company_name => { :type => :string, :boost => 5 }
          }
        },
        :comments => {
          :properties => {
            :description => { :type => :string, :boost => 5 }
          }
        }
      }
    )
  end

As you can see above, there are two models here user and comments. Is it the correct way to address single index with multiple models.

In that case how do I update index when a user document or comment document alone is changed?

Usually when you are indexing a model it is good to index the self attributes along with its associations. So in this case if you want index users and their commments, you should have the index in the user model and index the comments referenced by its association so that tire callbacks apply on the user model to reindex the user object if any attributes in the model are changed. This is only for the model on which you have the index on.

If at all you want to index associations, you need to have hooks that will index the account object after save/ after destroy of user/comments model. Or you could also use :touch => true option to touch the account model on change of user/comments.

Example: if you want index user and comments,

  include Tire::Model::Search
  include Tire::Model::Callbacks

     mapping do
        indexes :id,                  :type => 'integer', :index    => :not_analyzed
        indexes :about_me,            :type => 'string',  :index    => :snowball
        indexes :name,                :type => 'string',  :index    => :whitespace

        indexes :comments do
          indexes :content,                  :type => 'string', :analyzer => 'snowball'
        end
    end

So here the index is on the user model and user.comments is an association. Hope this example explains

The answer to the question as posted by Tire owner Karmi is as follows:

Let's say we have an Account class and we deal in articles entities.

In that case, our Account class would have following:

class Account
  #...

  # Set index name based on account ID
  #
  def articles
      Article.index_name "articles-#{self.id}"
      Article
  end
end

So, whenever we need to access articles for a particular account, either for searching or for indexing, we can simply do:

@account = Account.find( remember_token_or_something_like_that )

# Instead of `Article.search(...)`:
@account.articles.search { query { string 'something interesting' } }

# Instead of `Article.create(...)`:
@account.articles.create id: 'abc123', title: 'Another interesting article!', ...

Having a separate index per user/account works perfect in certain cases -- but definitely not well in cases where you'd have tens or hundreds of thousands of indices (or more). Having index aliases, with properly set up filters and routing, would perform much better in this case. We would slice the data not based on the tenant identity, but based on time.

Let's have a look at a second scenario, starting with a heavily simplified curl http://localhost:9200/_aliases?pretty output:

{
  "articles_2012-07-02" : {
    "aliases" : {
      "articles_plan_pro" : {
      }
    }
  },
  "articles_2012-07-09" : {
    "aliases" : {
      "articles_current" : {
      },
      "articles_shared" : {
      },
      "articles_plan_basic" : {
      },
      "articles_plan_pro" : {
      }
    }
  },
  "articles_2012-07-16" : {
    "aliases" : {
    }
  }
}

You can see that we have three indices, one per week. You can see there are two similar aliases: articles_plan_pro and articles_plan_basic -- obviously, accounts with the “pro” subscription can search two weeks back, but accounts with the “basic” subscription can search only this week.

Notice also, that the the articles_current alias points to, ehm, current week (I'm writing this on Thu 2012-07-12). The index for next week is just there, laying and waiting -- when the time comes, a background job (cron, Resque worker, custom script, ...) will update the aliases. There's a nifty example with aliases in “sliding window” scenario in the Tire integration test suite.

Let's not look on the articles_shared alias right now, let's look at what tricks we can play with this setup:

class Account
  # ...

  # Set index name based on account subscription
  #
  def articles
    if plan_code = self.subscription && self.subscription.plan_code
      Article.index_name "articles_plan_#{plan_code}"
    else
      Article.index_name "articles_shared"
    end
    return Article
  end
end

Again, we're setting up an index_name for the Article class, which holds our documents. When the current account has a valid subscription, we get the plan_code out of the subscription, and direct searches for this account into relevant index: “basic” or “pro”.

If the account has no subscription -- he's probably a “visitor” type -- , we direct the searches to the articles_shared alias. Using the interface is as simple as previously, eg. in ArticlesController:

@account  = Account.find( remember_token_or_something_like_that )
@articles = @account.articles.search { query { ... } }
# ...

We are not using the Article class as a gateway for indexing in this case; we have a separate indexing component, a Sinatra application serving as a light proxy to elasticsearch Bulk API, providing HTTP authentication, document validation (enforcing rules such as required properties or dates passed as UTC), and uses the bare Tire::Index#import and Tire::Index#store APIs.

These APIs talk to the articles_currentindex alias, which is periodically updated to the current week with said background process. In this way, we have decoupled all the logic for setting up index names in separate components of the application, so we don't need access to the Article or Account classes in the indexing proxy (it runs on a separate server), or any component of the application. Whichever component is indexing, indexes against articles_current alias; whichever component is searching, searches against whatever alias or index makes sense for the particular component.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM