[英]Elasticsearch with Tire on Rails bulk import & indexing issue
I've a rails app with a full-text search based on Elasticsearch and Tire, it is already working on a MongoDB model called Category, but now I want to add a more complex search based on MongoID Embedded 1-n model User which embeds_many :watchlists 我有一个基于Elasticsearch和Tire的全文搜索的rails应用程序,它已经在开发一个名为Category的MongoDB模型,但是现在我想基于MongoID Embedded 1-n模型用户添加更复杂的搜索, 这个用户是embeds_many :监视列表
Now I have to bulk import and indexing all the field in Watchlist, and I'd like to know : 现在我必须批量导入并索引Watchlist中的所有字段,我想知道:
The Embedded 1-N MongoDB/MongoID model looks like the following : 嵌入式1-N MongoDB / MongoID模型如下所示:
app/models/user.rb ( the parent ) : app / models / user.rb(父母):
class User
include Mongoid::Document
include Tire::Model::Search
include Tire::Model::Callbacks
index_name 'users'
field :nickname
field ... many others
embeds_many :watchlists
end
app/models/watchlist.rb ( the embedded "many" childrens ) : app / models / watchlist.rb(嵌入式“很多”孩子们):
class Watchlist
include Mongoid::Document
include Tire::Model::Search
include Tire::Model::Callbacks
index_name 'watchlists'
field :html_url
embedded_in :user
end
Any suggestion on how to accomplish the task ? 关于如何完成任务的任何建议?
UPDATE: here it is a chunk of the model seen with mongo shell 更新:这里是mongo shell看到的模型的一大块
> user = db.users.findOne({'nickname': 'lgs'})
{
"_id" : ObjectId("4f76a16cf2a6a12f88cbca43"),
"encrypted_password" : "",
"sign_in_count" : 0,
"provider" : "github",
"uid" : "1573",
"name" : "Luca G. Soave",
"email" : "luca.soave@gmail.com",
"nickname" : "lgs",
"watchlists" : [
{
"_id" : ObjectId("4f76997f1d41c81173000002"),
"tags_array" : [ git, peristence ],
"html_url" : "https://github.com/mojombo/grit",
"description" : "Grit gives you object oriented read/write access to Git repositories via Ruby.",
"fork_" : false,
"forks" : 207,
"watchers" : 1258,
"created_at" : ISODate("2007-10-29T14:37:16Z"),
"pushed_at" : ISODate("2012-01-27T01:05:45Z"),
"avatar_url" : "https://secure.gravatar.com/avatar/25c7c18223fb42a4c6ae1c8db6f50f9b?d=https://a248.e.akamai.net/assets.github.com%2Fimages%2Fgravatars%2Fgravatar-140.png"
},
...
...
}
I'd like to index & query any fields owned by the embedded child watchlists doc : 我想索引和查询嵌入式子级监视列表doc拥有的任何字段:
... "tags_array", "html_url", "description", "forks"
but I don't want elasticsearch to include the parent user fields : 但我不希望elasticsearch包含父用户字段:
... "uid", "name", "email", "nickname"
so that when I query for "git persistence", it will look into each 'watchlists' indexed fields of each 'user' of the original MongoDB. 因此,当我查询“git persistence”时,它将查看原始MongoDB的每个“用户”的每个“关注列表”的索引字段。
(sorry for mismatching singular and plurals here, I was just indicating the doc object names) (对不起这里的单数和复数不匹配,我只是指出了doc对象的名字)
It really depends on how you want to serialize your data for the search engine, based on how you want to query them. 这取决于您希望如何根据查询方式为搜索引擎序列化数据。 Please update the question and I'll update the answer.
请更新问题,我会更新答案。 (Also, it's better to just remove the ES logs, they are not relevant here.)
(另外,最好只删除ES日志,它们在这里不相关。)
I'm not sure how the Rake task works with embedded documents in Mongo, and also why it seems to "hang" at the end. 我不确定Rake任务如何与Mongo中的嵌入式文档一起使用,以及为什么它似乎在最后“挂起”。 Is your data in the "users" index when you run the task?
运行任务时,您的数据是否在“用户”索引中?
Notice that it's quite easy to provide your own indexing code, when the Rake task is not flexible enough. 请注意,当Rake任务不够灵活时,提供自己的索引代码非常容易。 See the
Tire::Index#import
integration tests. 请参阅
Tire::Index#import
integration tests。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.