简体   繁体   English

mongo db插入大集合

[英]mongo db insert big collections

I have a mongo (version 2) in production in replicaset configuration (the next step is to add sharding). 我在副本集配置中正在生产中使用mongo(版本2)(下一步是添加分片)。

I need to implement the following: 我需要实现以下内容:

  • Once a day i'll receive a file with millions rows and i shall load it into mongo. 每天一次,我将收到一个包含数百万行的文件,并将其加载到mongo中。
  • I have a runtime application that always read from this collection - very large amount of reads, and their performance is very important. 我有一个运行时应用程序,该应用程序始终从该集合中读取-大量读取,它们的性能非常重要。 The collection is indexed and all read perform readByIndex operation. 对该集合进行索引,并且所有读取均执行readByIndex操作。

My current implementation of loading is: 我当前的加载实现是:

  1. drop collection 下降收集
  2. create collection 创建收藏
  3. insert into collection new documents 插入新文件中

One of the thing I see is that because of mongoDB lock my total performance getting worst during the loading. 我看到的一件事是,由于mongoDB锁定,我的总性能在加载期间变得最差。 I've checked the collection with up to 10Million entries. 我已经检查了多达1000万个条目的集合。 For more that that size I think I should start use sharding 对于更大的尺寸,我认为我应该开始使用分片

What is the best way to love such issue? 爱这个问题的最好方法是什么? Or maybe should I use another solution strategy? 还是应该使用其他解决方案策略?

You could use two collections :) 您可以使用两个集合:)

  • collectionA contains this day's data collectionA包含当天的数据
  • new data arrives 新数据到来
  • create a new collection (collectionB) and insert the data 创建一个新集合(collectionB)并插入数据
  • now use collectionB as your data 现在使用collectionB作为您的数据

Then, next day, repeat the above just swapping A and B :) 然后,第二天,重复上述操作,只是交换A和B :)

This will let collectionA still service requests while collectionB is being updated. 这将使collectionA在更新collectionB时仍可以处理请求。

PS Just noticed that I'm about a year late answering this question :) PS刚注意到,我回答这个问题迟到了一年:)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM