简体繁体 English

持久磁盘可靠性和GCE中的备份

[英]Persistent disk reliability and backups in GCE

原文 2014-06-27 11:44:07 7 1 google-compute-engine

(My apologies for this being a slightly off-SO question, but it seems that the GCE questions tend to be slightly loess tightly connected to programming.) （我为此道歉是一个稍微偏离SO的问题，但似乎GCE问题往往与编程紧密相关。）

I am designing a data acquisition program to run on GCE. 我正在设计一个在GCE上运行的数据采集程序。 The data is collected onto a persistent disk. 数据被收集到永久磁盘上。 As the data is something I cannot afford to lose, I need to know something about the reliability of the persistent disks. 由于数据是我无法承受的，我需要了解持久磁盘的可靠性。 I have been able to find three pieces of information: 我能够找到三条信息：

Persistent Disks have built-in redundancy to protect your data against equipment failure and to remain available through datacenter maintenance events. 永久磁盘具有内置冗余，可保护您的数据免受设备故障的影响，并通过数据中心维护事件保持可用。 Your instances, free of local storage, can be moved by Google Live Migration to newer hardware without your intervention. 您的实例（无本地存储）可以通过Google Live Migration移动到更新的硬件，而无需您的干预。 This allows Google datacenters to be maintained at the highest level; 这使得Google数据中心可以维持在最高水平; software, hardware, and facilities can be continually updated to ensure excellent performance and reliability for your cloud-based services. 可以不断更新软件，硬件和设施，以确保基于云的服务具有出色的性能和可靠性。 [Google] [谷歌]
Google Compute Engine uses redundant, industry-standard mechanisms to protect persistent disk users from data corruption and from sophisticated attacks against data integrity. Google Compute Engine使用冗余的行业标准机制来保护持久磁盘用户免受数据损坏和数据完整性的复杂攻击。 [Google] [谷歌]
Google Persistent Disk will never return erroneous data, instead there will be an IO error. Google永久磁盘永远不会返回错误数据，而是会出现IO错误。 [I cannot find this one right now, but remember reading this from some of Google's docs, so take this with a pinch a salt] [我现在找不到这个，但请记住从谷歌的一些文档中读到这篇文章，所以请加上一点点盐]

The cloud storage comes with some reliability numbers, but is there some information for persistent disks? 云存储带有一些可靠性数字，但是有持久磁盘的一些信息吗？ Without knowing any reliability estimates it is difficult to choose the backup regime. 在不知道任何可靠性估计的情况下，很难选择备份机制。 The opposite ends of the continuum are using hot backups with real time synchronization and taking regular deltas of the data (the data is append-only). 连续体的两端使用具有实时同步的热备份并且定期获取数据的增量（数据仅附加）。 In the latter case recovery takes much longer and will most probably involve manual bit-stitching. 在后一种情况下，恢复需要更长的时间，并且很可能涉及手动位拼接。 That can be afforded, if the MTBF is high enough. 如果MTBF足够高，则可以提供。

I am not worried about brief downtimes, but I am worried about data corruption. 我并不担心短暂的停机时间，但我担心数据损坏。 The system is running on linux+ext4, so it should be resilient against unplanned downtime. 系统在linux + ext4上运行，因此它应该能够抵御意外停机。