简体   繁体   English

如何在Mongodb中处理数据库清除

[英]How to handle database purging in Mongodb

I use mongodb for storing 30 day data which come to me as a stream. 我使用mongodb存储30天的数据,这些数据作为流来到我这里。 I am searching for a purging mechanism by which I can throw away oldest data to create room for new data. 我正在寻找一种清除机制,通过它我可以丢弃最旧的数据,为新数据创造空间。 I used to use mysql in which I handled this situation using partitions. 我以前使用mysql,我使用分区处理这种情况。 I kept 30 partitions which are date based. 我保留了30个以日期为基础的分区。 I delete the oldest dated partition and created a new partition to hold new data. 我删除了最旧的日期分区并创建了一个新分区来保存新数据。

When I map the same thing in mongodb, I feel like using a date based 'shards'. 当我在mongodb中映射相同的东西时,我觉得使用基于日期的“分片”。 But the problem is that it makes my data distribution bad. 但问题是它使我的数据分发变坏。 If all the new data are in the same shard, then that shard will be so hot as there are lot of people accessing them and the shards containing older data will be less loaded by users. 如果所有新数据都在同一个分片中,那么该分片将会很热,因为有很多人访问它们,并且包含旧数据的分片将减少用户的负载。

I can have a collection based purging. 我可以有一个基于集合的清除。 I can have 30 collections and I can throw away the oldest collection to accommodate new data. 我可以有30个收藏品,我可以丢弃最旧的收藏品以容纳新数据。 But couple of problems are 1) If I make collections smaller then I cannot benefit much from sharding as they are done per collection. 但是有几个问题是1)如果我将集合缩小,那么我不能从分片中获益,因为它们是按照每个集合完成的。 2) My queries have to change to query from all 30 collections and take an union. 2)我的查询必须更改为从所有30个集合中查询并进行联合。

Please suggest me a good purging mechanism (if any) to handle this situation. 请建议我一个很好的清除机制(如果有的话)来处理这种情况。

There are really only three ways to do purging in MongoDB. 在MongoDB中只有三种方法可以进行清除。 It looks like you've already identified several of the trade-offs. 看起来你已经确定了几个权衡因素。

  1. Single collection, delete old entries 单个集合,删除旧条目
  2. Collection per day, drop old collections 每天收集,丢弃旧的收藏品
  3. Database per day, drop old databases 每天数据库,删除旧数据库

Option #1: single collection 选项#1:单一集合

pros 利弊

  • Easy to implement 易于实施
  • Easy to run Map/Reduces 易于运行Map / Reduces

cons 缺点

  • Deletes are as expensive as inserts, causes lots of IO and the need to "defragment" or "compact" the DB. 删除与插入一样昂贵,导致大量IO以及需要对数据库进行“碎片整理”或“压缩”。
  • At some point you end up handling double the "writes" as you have to both insert a day's worth of data and delete a day's worth of data. 在某些时候,你最终会处理“写入”的两倍,因为你必须插入一天的数据并删除一天的数据。

Option #2: collection per day 选项#2:每天收集

pros 利弊

  • Removing data via collection.drop() is very fast. 通过collection.drop()删除数据非常快。
  • Still Map/Reduce friendly as the output from each day can be merged or re-reduced against the summary data. 仍然映射/减少友好,因为每天的输出可以与摘要数据合并或重新减少。

cons 缺点

  • You may still have some fragmenting problems. 您可能仍然存在一些碎片问题。
  • You will need to re-write queries. 您需要重新编写查询。 However, in my experience if you have enough data that you're purging, you rarely access that data directly. 但是,根据我的经验,如果您有足够的数据要清除,则很少直接访问该数据。 Instead you tend to run Map/Reduces over that data. 相反,您倾向于对该数据运行Map / Reduces。 So this may not change that many queries. 所以这可能不会改变那么多查询。

Option #3: database per day 选项#3:每天数据库

pros 利弊

  • Deletion is as fast as possible, files are simply truncated. 删除速度尽可能快,文件只是被截断。
  • Zero fragmentation problems and easy to backup / restore / archive old data. 零碎片问题,易于备份/恢复/归档旧数据。

cons 缺点

  • Will make querying more challenge ( expect to write some wrapper code ). 将使查询更具挑战性( 期望编写一些包装代码 )。
  • Not as easy to write Map/Reduce's, though take a look at the Aggregation Framework as that may better satisfy your needs anyways. 编写Map / Reduce并不容易,但是看看聚合框架可能会更好地满足您的需求。

Now there is an option #4, but it is not a general solution. 现在有一个选项#4,但它不是一般解决方案。 I know of some people who did "purging" by simply using Capped Collections . 我知道有些人只是使用Capped Collections来“清除”。 There are definitely cases where this works, but it has a bunch of caveats, so you really need to know what you're doing. 肯定有这样的情况,但它有一些警告,所以你真的需要知道你在做什么。

we can set TTL for collection from mongodb 2.2 release or higher. 我们可以从mongodb 2.2版本或更高版本中设置TTL用于收集。 this will help you to expire old data from collection. 这将帮助您从集合中过期旧数据。

Follow this link: http://docs.mongodb.org/manual/tutorial/expire-data/ 请点击此链接: http//docs.mongodb.org/manual/tutorial/expire-data/

I had a similar situation and this page helped me out, especially the "Helpful Scripts" section at the bottom. 我有类似的情况,这个页面帮助了我,特别是底部的“有用的脚本”部分。 http://www.mongodb.org/display/DOCS/Excessive+Disk+Space http://www.mongodb.org/display/DOCS/Excessive+Disk+Space

最好将一台服务器保存为存档执行15天间隔清除从存档中删除旧存档。使用更多数据分区进行存档

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM