简体繁体 English

数据库备份最佳实践

[英]Database Backup best practices

原文 2013-05-10 09:13:33 7 1 database/ backup/ restore/ marklogic

I am working in a production environment, where we process XML files daily. 我在生产环境中工作，我们每天处理XML文件。 Our database size is quite big. 我们的数据库大小非常大。 we are taking a daily backup. 我们每天都要备份。 I learned that Marklogic adds up changes to your previous backup to create new backup. 我了解到Marklogic会为您之前的备份添加更改以创建新备份。

I wanted to confirm that is it the best way to keep daily backup or there is any other better way to do it. 我想确认这是保持每日备份的最佳方式，还是有其他更好的方法。 Also is there any limit to the process, that I am following. 我也遵循这个过程的任何限制。 My Database size is around 350 GB and increasing daily. 我的数据库大小约为350 GB ，每天都在增加。 So I am looking for a faster and easier solution. 所以我正在寻找一种更快速，更简单的解决方案。

1 个解决方案

This question is fairly open-ended: there is no single "best way". 这个问题相当开放：没有单一的“最佳方式”。 MarkLogic supports full online backups, and journal archiving for continuous incremental backup. MarkLogic支持完整的在线备份和日记归档，以实现持续增量备份。 The docs at http://docs.marklogic.com/guide/admin/backup_restore discuss these options. http://docs.marklogic.com/guide/admin/backup_restore上的文档讨论了这些选项。

Instead of a full daily backup, you might consider a full weekly backup plus journal archiving. 您可以考虑每周完整备份以及日记归档，而不是每日完整备份。 As you start a new week, you can do whatever you like with the data the from previous week: retain it, delete it, move it onto cheaper storage, etc. 当您开始新的一周时，您可以使用上周的数据做任何您喜欢的事情：保留它，删除它，将它移到更便宜的存储上等等。

As MarkLogic databases go, 350-GB is not so large. 随着MarkLogic数据库的发展，350-GB并不是那么大。 However at that point you should have already configured multiple forests: see http://docs.marklogic.com/guide/cluster/scalability#id_96443 for guidelines. 但是，此时您应该已经配置了多个林：有关指南，请参阅http://docs.marklogic.com/guide/cluster/scalability#id_96443 。 Assuming you have multiple CPU cores, storing the content in a proportional number of forests will improve performance throughout the system. 假设您有多个CPU核心，将内容存储在相应数量的林中将提高整个系统的性能。 That includes backup, because multiple forests will back up in parallel - though of course the disk may still be the bottleneck. 这包括备份，因为多个林将并行备份 - 当然，磁盘可能仍然是瓶颈。 If storage is the bottleneck, separating the I/O for forests and backup is advisable. 如果存储是瓶颈，建议分离林和备份的I / O.

If having multiple forests is a new idea, you might also be interested in https://github.com/mblakele/task-rebalancer 如果有多个森林是一个新想法，你可能也对https://github.com/mblakele/task-rebalancer感兴趣