简体   繁体   中英

Elasticsearch with Tire on Rails bulk import & indexing issue

I've a rails app with a full-text search based on Elasticsearch and Tire, it is already working on a MongoDB model called Category, but now I want to add a more complex search based on MongoID Embedded 1-n model User which embeds_many :watchlists

Now I have to bulk import and indexing all the field in Watchlist, and I'd like to know :

  1. how can I do that ?
  2. can index just the watchlists children fields, without the user parents fields ?

The Embedded 1-N MongoDB/MongoID model looks like the following :

app/models/user.rb ( the parent ) :

class User
  include Mongoid::Document

  include Tire::Model::Search
  include Tire::Model::Callbacks
  index_name 'users'

  field :nickname
  field ... many others

  embeds_many :watchlists
end

app/models/watchlist.rb ( the embedded "many" childrens ) :

class Watchlist
  include Mongoid::Document

  include Tire::Model::Search
  include Tire::Model::Callbacks
  index_name 'watchlists'

  field :html_url
  embedded_in :user
end

Any suggestion on how to accomplish the task ?

UPDATE: here it is a chunk of the model seen with mongo shell

    > user = db.users.findOne({'nickname': 'lgs'})
    {
       "_id" : ObjectId("4f76a16cf2a6a12f88cbca43"),
       "encrypted_password" : "",
       "sign_in_count" : 0,
       "provider" : "github",
       "uid" : "1573",
       "name" : "Luca G. Soave",
       "email" : "luca.soave@gmail.com",
       "nickname" : "lgs",
       "watchlists" : [
           {
               "_id" : ObjectId("4f76997f1d41c81173000002"),
               "tags_array" : [ git, peristence ],
               "html_url" : "https://github.com/mojombo/grit",
               "description" : "Grit gives you object oriented read/write access to Git repositories via Ruby.",
               "fork_" : false,
               "forks" : 207,
               "watchers" : 1258,
               "created_at" : ISODate("2007-10-29T14:37:16Z"),
               "pushed_at" : ISODate("2012-01-27T01:05:45Z"),
               "avatar_url" : "https://secure.gravatar.com/avatar/25c7c18223fb42a4c6ae1c8db6f50f9b?d=https://a248.e.akamai.net/assets.github.com%2Fimages%2Fgravatars%2Fgravatar-140.png"
           },
       ...
       ...
    } 

I'd like to index & query any fields owned by the embedded child watchlists doc :

 ... "tags_array", "html_url", "description", "forks" 

but I don't want elasticsearch to include the parent user fields :

 ... "uid", "name", "email", "nickname" 

so that when I query for "git persistence", it will look into each 'watchlists' indexed fields of each 'user' of the original MongoDB.

(sorry for mismatching singular and plurals here, I was just indicating the doc object names)

It really depends on how you want to serialize your data for the search engine, based on how you want to query them. Please update the question and I'll update the answer. (Also, it's better to just remove the ES logs, they are not relevant here.)

I'm not sure how the Rake task works with embedded documents in Mongo, and also why it seems to "hang" at the end. Is your data in the "users" index when you run the task?

Notice that it's quite easy to provide your own indexing code, when the Rake task is not flexible enough. See the Tire::Index#import integration tests.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM