简体   繁体   English

批量插入Mongo-红宝石

[英]Bulk Insert into Mongo - Ruby

I am new to Ruby and Mongo and am working with twitter data. 我是Ruby和Mongo的新手,正在使用Twitter数据。 I'm using Ruby 1.9.3 and Mongo gems. 我正在使用Ruby 1.9.3和Mongo gem。

I am querying bulk data out of Mongo, filtering out some documents, processing the remaining documents (inserting new fields) and then writing new documents into Mongo. 我正在从Mongo中查询批量数据,过滤掉一些文档,处理其余文档(插入新字段),然后将新文档写入Mongo。

The code below is working but runs relatively slow as I loop through using .each and then insert new documents into Mongo one at a time. 下面的代码可以正常工作,但是当我循环使用.each然后一次将一个新文档插入Mongo时,运行速度相对较慢。

My Question: How can this be structured to process and insert in bulk? 我的问题:如何构造和批量处理和插入?

cursor = raw.find({'user.screen_name' => users[cur], 'entities.urls' => []},{:fields => params})

cursor.each do |r| 
  if r['lang'] == "en"
    score = r['retweet_count'] + r['favorite_count']
    timestamp = Time.now.strftime("%d/%m/%Y %H:%M")

    #Commit to Mongo
    @document = {:id => r['id'],
                :id_str => r['id_str'],
                :retweet_count => r['retweet_count'],
                :favorite_count => r['favorite_count'],
                :score => score,    
                :created_at => r['created_at'],
                :timestamp => timestamp,
                :user => [{:id => r['user']['id'],
                           :id_str => r['user']['id_str'],
                           :screen_name => r['user']['screen_name'],
                          }
                         ]
                }
    @collection.save(@document)   
    end #end.if
end #end.each

Any help is greatly appreciated. 任何帮助是极大的赞赏。

In your case there is no way to make this much faster. 在您的情况下,没有办法使它更快。 One thing you could do is retrieve the documents in bulks, processing them and the reinserting them in bulks, but it would still be slow. 您可以做的一件事是批量检索文档,进行处理并重新批量插入,但这仍然很慢。

To speed this up you need to do all the processing server side, where the data already exist. 为了加快速度,您需要完成所有已经存在数据的处理服务器端。

You should either use the aggregate framework of mongodb if the result document does not exceed 16mb or for more flexibility but slower execution (much faster than the potential your solution has) you can use the MapReduce framework of mongodb 如果结果文档不超过16mb 则应该使用mongodb聚合框架,或者为了获得更大的灵活性但执行速度较慢(比解决方案的潜力要快得多),可以使用mongodbMapReduce框架

What exactly are you doing? 你到底在做什么 Why not going pure ruby or pure mongo (well that's ruby too) ? 为什么不使用纯红宝石或纯蒙哥(同样也是红宝石)? and Why do you really need to load every single attribute? 以及为什么您真的需要加载每个属性?

What I've understood from your code is you actually create a completely new document, and I think that's wrong. 我从您的代码中了解到,您实际上是在创建一个全新的文档,我认为这是错误的。

You can do that with this in ruby side: 您可以在红宝石方面做到这一点:

cursor = YourModel.find(params)

cursor.each do |r|
    if r.lang == "en"
        r.score = r.retweet_count + r.favorite_count
        r.timestamp = Time.now.strftime("%d/%m/%Y %H:%M")
        r.save
    end #end.if
end #end.each

And ofcourse you can import include Mongoid::Timestamps in your model and it handles your created_at , and updated_at attribute (it creates them itself) 当然,您也可以导入模型中的include Mongoid::Timestamps ,它处理您的created_atupdated_at属性(它自己创建)

in mongoid it's a little harder first you get your collection with use my_db then the next code will generate what you want 在mongoid中,首先use my_db获得集合,然后再进行下一步将生成您想要的东西

db.models.find({something: your_param}).forEach(function(doc){
    doc.score = doc.retweet_count + doc.favorite_count
    doc.timestamp = new Timestamp()
    db.models.save(doc)
    }
);

I don't know what was your parameters, but it's easy to create them, and also mongoid really do lazy loading, so if you don't try to use an attribute, it won't load that. 我不知道您的参数是什么,但是创建它们很容易,并且mongoid确实可以进行延迟加载,因此,如果您不尝试使用属性,则不会加载该参数。 You can actually save a lot of time not using every attribute. 实际上,不使用每个属性都可以节省大量时间。 And these methods, change the existing document, and won't create another one. 这些方法将更改现有文档,而不会创建另一个文档。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM