I use mongodb for storing 30 day data which come to me as a stream. I am searching for a purging mechanism by which I can throw away oldest data to create room for new data. I used to use mysql in which I handled this situation using partitions. I kept 30 partitions which are date based. I delete the oldest dated partition and created a new partition to hold new data.
When I map the same thing in mongodb, I feel like using a date based 'shards'. But the problem is that it makes my data distribution bad. If all the new data are in the same shard, then that shard will be so hot as there are lot of people accessing them and the shards containing older data will be less loaded by users.
I can have a collection based purging. I can have 30 collections and I can throw away the oldest collection to accommodate new data. But couple of problems are 1) If I make collections smaller then I cannot benefit much from sharding as they are done per collection. 2) My queries have to change to query from all 30 collections and take an union.
Please suggest me a good purging mechanism (if any) to handle this situation.
There are really only three ways to do purging in MongoDB. It looks like you've already identified several of the trade-offs.
Option #1: single collection
pros
cons
Option #2: collection per day
pros
collection.drop()
is very fast. cons
Option #3: database per day
pros
cons
Now there is an option #4, but it is not a general solution. I know of some people who did "purging" by simply using Capped Collections . There are definitely cases where this works, but it has a bunch of caveats, so you really need to know what you're doing.
we can set TTL for collection from mongodb 2.2 release or higher. this will help you to expire old data from collection.
Follow this link: http://docs.mongodb.org/manual/tutorial/expire-data/
I had a similar situation and this page helped me out, especially the "Helpful Scripts" section at the bottom. http://www.mongodb.org/display/DOCS/Excessive+Disk+Space
最好将一台服务器保存为存档执行15天间隔清除从存档中删除旧存档。使用更多数据分区进行存档
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.