简体   繁体   English

如何备份aws ec2实例/临时存储?

[英]How do I take a backup of aws ec2 instance/ephemeral storage?

I have my db kept at /mnt, using ephemeral storage that comes with ec2 instance. 我的数据库保存在/ mnt,使用ec2实例附带的临时存储。 To take the backup using ec2 api tools we need a volume id, but in the aws console I can find the volume id of only the 8gb root storage. 要使用ec2 api工具进行备份,我们需要一个卷标识,但在aws控制台中,我只能找到8gb根存储的卷标识。

What should I do if want the backup of ephemeral storage? 如果想要临时存储的备份,我该怎么办? Is there any alternative for backing up instance storage? 有备份实例存储的替代方法吗?

First and foremost, you should never store anything of lasting value on ephemeral storage in Amazon EC2 , except if you know exactly what you are doing and are prepared to always have point in time backups etc. - your question seems to indicate that you might be mistaken about the concept of ephemeral storage, the respective difference between Amazon EC2 Instance Storage an Amazon EBS and the significant implications regarding data safety and backup requirements: 首先,除非您确切知道自己在做什么并且准备好始终进行时间点备份等,否则您永远不应该在亚马逊EC2中 存储任何具有持久价值的东西 ,但是您的问题似乎表明您可能是误解了短暂存储的概念, Amazon EC2实例存储Amazon EBS之间的差异以及对数据安全和备份要求的重大影响:

Ephemeral storage will be lost on stop/start cycles and can generally go away , so you definitely don't want to put anything of lasting value there, ie only put temporary data there you can afford to lose or rebuild easily , like a swap file or strictly temporary data in use during computations. 短暂存储将在停止/启动周期中丢失并且通常会消失 ,因此您绝对不希望在那里放置任何持久值,即只将临时数据放在那里,您可以承受丢失或轻松重建 ,如交换文件或严格在计算过程中使用的临时数据。 Of course you might store huge indexes there for example, but must be prepared to rebuild these after the storage has been cleared for whatever reason (instance reboot, hardware failure, ...). 当然,您可以在那里存储大量索引,但必须准备好在存储因任何原因被清除后重建这些索引(实例重新启动,硬件故障,......)。

  • That's one of the many reasons Eric Hammond excellently summarized in You Should Use EBS Boot Instances on Amazon EC2 ), which outlines the history of and differences between the two storage concepts and assesses the few remaining possible benefits of ephemeral storage (mainly being plentiful and free). 这是Eric Hammond在“ 你应该在Amazon EC2上使用EBS引导实例”中总结出的众多原因之一,其中概述了两种存储概念的历史和差异,并评估了短暂存储的剩余可能带来的好处(主要是丰富和免费) )。

Problem/Solution 问题方案

These explanations should clarify why you are unable to backup the ephemeral storage volumes with a mechanism that solely applies to EBS volumes (ie EBS snapshots). 这些解释应阐明您无法使用仅适用于EBS卷(即EBS快照)的机制备份临时存储卷的原因。 Accordingly, you can backup the former via regular operating system level backup tool of your choice, with Duplicity being a popular choice optionally facilitating Amazon S3 for example, as addressed in my answer to Easiest to use backup software for live linux server . 因此,您可以通过您选择的常规操作系统级备份工具备份前者, Duplicity是一个受欢迎的选择,可选择促进Amazon S3 ,例如我最简单地使用备份软件用于实时Linux服务器的回答。

Ephemeral storage, or instance storage, as-is, is like a /tmp folder, the contents of which disappear after a reboot. 临时存储或实例存储,就像一个/ tmp文件夹,其内容在重新启动后消失。 Of course, ephemeral drive contents aren't destroyed on a soft reboot, but they should be treated as if they were, since you can't realistically control or predict when your instance decides to die. 当然,短暂的驱动器内容不会在软重启时被破坏,但是它们应该被视为它们,因为您无法实际控制或预测实例何时决定死亡。

This has already been pointed out. 已经指出了这一点。

What I'd like to point out, is that if you create and configure your AMIs appropriately, you can still use the ephemeral storage to drastically improve (read) throughput, so long as you also keep EBS drives for the actual storage. 我想指出的是,如果您正确地创建和配置AMI,您仍然可以使用短暂存储来大幅提高(读取)吞吐量,只要您还将EBS驱动器保留在实际存储中即可。

What I'm using at the moment is Linux (Ubuntu Tahr) instances with bcache. 我目前使用的是带有bcache的Linux(Ubuntu Tahr)实例。 This is mainly because bcache kernel support is relatively new (IIRC, first one with bcache was 3.10), and you'd definitely want as recent a kernel as possible. 这主要是因为bcache内核支持相对较新(IIRC,第一个bcache是​​3.10),你肯定想要尽可能最新的内核。 Also, Tahr is the next LTS version of Ubuntu, and it's final when my project is close to launch ;) 此外,Tahr是Ubuntu的下一个LTS版本,当我的项目接近发布时它是最终的;)

Bcache, in its default configuration, allows you to benefit from the read speed of the ephemeral storage while giving you the persistence of EBS: It takes a fast cache device (ephemeral SSD) and uses it to speed up a slow device (EBS), writing through the cache device (that is, writing simultaneously to ephemeral cache and EBS). Bcache的默认配置允许您从短暂存储的读取速度中受益,同时为您提供EBS的持久性:它需要一个快速缓存设备(临时SSD)并使用它来加速慢速设备(EBS),通过缓存设备写入(即同时写入短暂缓存和EBS)。

This means that should an instance crash or otherwise be stopped, you can still mount the EBS volume directly without the cache, and access all your data as you would otherwise using only EBS volumes. 这意味着如果实例崩溃或以其他方式停止,您仍然可以直接安装EBS卷而不使用缓存,并像访问仅使用EBS卷一样访问所有数据。 You can also reconfigure the now wiped ephemeral devices and re-configure them as a cache to the EBS to get back to enjoying very fast reads and seeks. 您还可以重新配置现在已擦除的短暂设备,并将它们重新配置为EBS的缓存,以恢复享受非常快速的读取和搜索。

My particular setup is two EBS devices, raided in stripe mode using mdadm + two ephemeral SSD devices also raided in the same manner. 我的特殊设置是两个EBS设备,使用mdadm +两个短暂的SSD设备以条纹模式进行突袭,也以相同的方式进行搜索。 Then I've configured them with bcache, using the ephemeral array as the cache, and the EBS array as the "backup" device. 然后我用bcache配置它们,使用短暂的数组作为缓存,将EBS数组作为“备份”设备。 The EBS drives can be any size, and you can always expand them (a bit tricky with EC2, because you have to create a snapshot of the current EBS volumes, and then create new larger ones based on that snapshot — you can't resize an existing EBS volume). EBS驱动器可以是任何大小,您可以随时扩展它们(EC2有点棘手,因为您必须创建当前EBS卷的快照,然后根据该快照创建新的更大的 - 您无法调整大小现有的EBS卷)。

Of course, you'll have to create a script that runs inside your instance at startup to configure the ephemeral storage and attach it as a cache device on your EBS-backed backup device. 当然,您必须创建一个在启动时在实例内运行的脚本,以配置临时存储并将其作为缓存设备附加到EBS支持的备份设备上。 I encourage reading up on, and experimenting with, mdadm and bcache . 我鼓励阅读,并尝试使用mdadmbcache

For the record, testing with the Cassandra stress tool, I get better read performance with EBS volumes bcached with the ephemeral drives than I do with just striping the ephemeral drives. 为了记录,使用Cassandra压力工具进行测试,我获得了更好的读取性能,其中EBS卷使用临时驱动器进行了bcached,而不是简单地剥离短暂驱动器。 This is because of the algorithm used in bcache, which is very clever. 这是因为bcache中使用的算法非常聪明。

Using the ephemeral drives as a cache also reduced network traffic and is cost-effective, as it reduces I/O on EBS, and thereby your monthly bill. 使用临时驱动器作为缓存还可以减少网络流量并且具有成本效益,因为它可以减少EBS上的I / O,从而减少每月帐单。

Also note the different types of caching bcache provides: 另请注意bcache提供的不同类型的缓存:

  1. Write back: Use the SSD as read/write device, and only write to the backup device when pages need to be evicted from the cache. 回写:使用SSD作为读/写设备,只有在需要从缓存中逐出页面时才写入备份设备。 This is not useful for EC2 ephemeral setups, as it will render your backup device useless on a crash or stop. 这对于EC2短暂设置没有用,因为它会使您的备份设备在崩溃或停止时无用。
  2. Write through: All writes go to both cache and backup. 直写:所有写入都同时进行缓存和备份。 This ensures that the backup device is always as up-to-date as the cache device, and it can always be used without the cache device. 这可确保备份设备始终与高速缓存设备保持同步,并且始终可以在没有高速缓存设备的情况下使用它。 Useful for EC2. 对EC2有用。
  3. Write around: All writes go directly to the backup device, and are not written to the cache device until a read request happens for that data some time in the future. 写入:所有写入都直接进入备份设备,并且在将来某个时间发生对该数据的读取请求之前,不会写入高速缓存设备。 Only reads are cached on the cache device. 只有读取缓存在缓存设备上。 This is as safe as write through, and is useful if you know that your writes are not likely to be read in the near future. 这与直写一样安全,如果您知道在不久的将来不可能读取您的写入,则非常有用。 This avoids filling the cache device with data that isn't requested often, so that there's more space for what is requested data. 这避免了填充不经常被请求的数据高速缓存设备,从而有对请求哪些数据更多的空间。 A couple of examples could be a file upload server, a system where you write a lot of logging data, etc. If you know that your entire data set is significantly larger than the ephemeral storage size, this is most likely to be the most efficient option in a large numer of use cases. 一些示例可以是文件上载服务器,您可以编写大量日志记录数据的系统等。如果您知道整个数据集明显大于短暂存储大小,那么这很可能是最有效的大量用例中的选项。

If you are able to configure a software RAID mirror, you can attach an EBS-backed disk to the instance, configure a mirror, then wait for replication to complete. 如果能够配置软件RAID镜像,则可以将EBS支持的磁盘连接到实例,配置镜像,然后等待复制完成。 I have successfully used this method to move "ephemeral" data into EBS after I had already created the instance (and I did not want to shut down and reboot). 我已经成功地使用这种方法在我创建实例后将“短暂”数据移动到EBS中(我不想关闭并重新启动)。

Once you have the data on EBS, back up with EBS images. 获得EBS数据后,使用EBS图像进行备份。

This method works particularly well when you have multiple copies of the data running on different identical instances, but you only need one of them persisted to EBS (in my case, using Couchbase server, the CB data was on ephemeral drives, but I had one of the instances mirrored to EBS such that all the data on my cluster ended up in EBS). 当您在不同的相同实例上运行多个数据副本时,此方法特别有效,但您只需将其中一个持久保存到EBS(在我的情况下,使用Couchbase服务器,CB数据在短暂的驱动器上,但我有一个镜像到EBS的实例,以便我的集群上的所有数据最终都在EBS中。

Any file-level backup solution (not based on EBS snapshots) can back-up your ephemeral storage. 任何文件级备份解决方案(不基于EBS快照)都可以备份临时存储。 That said, you should consider when to use ephemeral storage, and have good reason to use it for persistent data. 也就是说,您应该考虑何时使用临时存储,并且有充分的理由将其用于持久数据。 For certain applications, like Cassandra, this is the recommended configuration. 对于某些应用程序,如Cassandra,这是推荐的配置。 In that case your backup solution will mostly dump the data from the ephemeral storage to an EBS volume that will be snapshotted or directly to S3. 在这种情况下,您的备份解决方案主要将数据从临时存储转储到将要快照或直接转发到S3的EBS卷。 In some cases you can define replication and make sure all data in the ephemeral device is also replicated to EBS volumes. 在某些情况下,您可以定义复制并确保临时设备中的所有数据也复制到EBS卷。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM