简体   繁体   English

Elasticsearch with Tire on Rails批量导入和索引问题

[英]Elasticsearch with Tire on Rails bulk import & indexing issue

I've a rails app with a full-text search based on Elasticsearch and Tire, it is already working on a MongoDB model called Category, but now I want to add a more complex search based on MongoID Embedded 1-n model User which embeds_many :watchlists 我有一个基于Elasticsearch和Tire的全文搜索的rails应用程序,它已经在开发一个名为Category的MongoDB模型,但是现在我想基于MongoID Embedded 1-n模型用户添加更复杂的搜索, 这个用户是embeds_many :监视列表

Now I have to bulk import and indexing all the field in Watchlist, and I'd like to know : 现在我必须批量导入并索引Watchlist中的所有字段,我想知道:

  1. how can I do that ? 我怎样才能做到这一点 ?
  2. can index just the watchlists children fields, without the user parents fields ? 只能在没有用户父母字段的情况下索引列表子字段?

The Embedded 1-N MongoDB/MongoID model looks like the following : 嵌入式1-N MongoDB / MongoID模型如下所示:

app/models/user.rb ( the parent ) : app / models / user.rb(父母):

class User
  include Mongoid::Document

  include Tire::Model::Search
  include Tire::Model::Callbacks
  index_name 'users'

  field :nickname
  field ... many others

  embeds_many :watchlists
end

app/models/watchlist.rb ( the embedded "many" childrens ) : app / models / watchlist.rb(嵌入式“很多”孩子们):

class Watchlist
  include Mongoid::Document

  include Tire::Model::Search
  include Tire::Model::Callbacks
  index_name 'watchlists'

  field :html_url
  embedded_in :user
end

Any suggestion on how to accomplish the task ? 关于如何完成任务的任何建议?

UPDATE: here it is a chunk of the model seen with mongo shell 更新:这里是mongo shell看到的模型的一大块

    > user = db.users.findOne({'nickname': 'lgs'})
    {
       "_id" : ObjectId("4f76a16cf2a6a12f88cbca43"),
       "encrypted_password" : "",
       "sign_in_count" : 0,
       "provider" : "github",
       "uid" : "1573",
       "name" : "Luca G. Soave",
       "email" : "luca.soave@gmail.com",
       "nickname" : "lgs",
       "watchlists" : [
           {
               "_id" : ObjectId("4f76997f1d41c81173000002"),
               "tags_array" : [ git, peristence ],
               "html_url" : "https://github.com/mojombo/grit",
               "description" : "Grit gives you object oriented read/write access to Git repositories via Ruby.",
               "fork_" : false,
               "forks" : 207,
               "watchers" : 1258,
               "created_at" : ISODate("2007-10-29T14:37:16Z"),
               "pushed_at" : ISODate("2012-01-27T01:05:45Z"),
               "avatar_url" : "https://secure.gravatar.com/avatar/25c7c18223fb42a4c6ae1c8db6f50f9b?d=https://a248.e.akamai.net/assets.github.com%2Fimages%2Fgravatars%2Fgravatar-140.png"
           },
       ...
       ...
    } 

I'd like to index & query any fields owned by the embedded child watchlists doc : 我想索引和查询嵌入式子级监视列表doc拥有的任何字段:

 ... "tags_array", "html_url", "description", "forks" 

but I don't want elasticsearch to include the parent user fields : 但我不希望elasticsearch包含父用户字段:

 ... "uid", "name", "email", "nickname" 

so that when I query for "git persistence", it will look into each 'watchlists' indexed fields of each 'user' of the original MongoDB. 因此,当我查询“git persistence”时,它将查看原始MongoDB的每个“用户”的每个“关注列表”的索引字段。

(sorry for mismatching singular and plurals here, I was just indicating the doc object names) (对不起这里的单数和复数不匹配,我只是指出了doc对象的名字)

It really depends on how you want to serialize your data for the search engine, based on how you want to query them. 这取决于您希望如何根据查询方式为搜索引擎序列化数据。 Please update the question and I'll update the answer. 请更新问题,我会更新答案。 (Also, it's better to just remove the ES logs, they are not relevant here.) (另外,最好只删除ES日志,它们在这里不相关。)

I'm not sure how the Rake task works with embedded documents in Mongo, and also why it seems to "hang" at the end. 我不确定Rake任务如何与Mongo中的嵌入式文档一起使用,以及为什么它似乎在最后“挂起”。 Is your data in the "users" index when you run the task? 运行任务时,您的数据是否在“用户”索引中?

Notice that it's quite easy to provide your own indexing code, when the Rake task is not flexible enough. 请注意,当Rake任务不够灵活时,提供自己的索引代码非常容易。 See the Tire::Index#import integration tests. 请参阅Tire::Index#import integration tests。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM