简体   繁体   English

如何将大于 5Tb 的 object 上传到 Google Cloud Storage?

[英]How to upload larger than 5Tb object to Google Cloud Storage?

Trying to save a PostgreSQL backup (~20 Tb) to Google Cloud Storage for the long-term, and I am currently piping PostgreSQL pg_dump() command to an streaming transfer through gsutil .试图将PostgreSQL 备份(~20 Tb)长期保存到谷歌云存储,我目前正在通过gsutil将 PostgreSQL pg_dump()命令传输到流传输

pg_dump -d $DB_NAME -b --format=t \
    | gsutil cp - gs://$BUCKET_NAME/$BACKUP_FILE

However, I am worried that the process will crash because of GCS' 5Tb object size limit .但是,我担心该进程会因为 GCS 的5Tb object 大小限制而崩溃。

Is there any way to upload larger than 5Tb objects to Google Cloud Storage ?有没有办法将大于 5Tb 的对象上传到 Google Cloud Storage

EDITION: using split ?版本:使用split

I am considering to pipe pg_dump to Linux's split utility and the gsutil cp .我正在考虑将 pipe pg_dump到 Linux 的split实用程序和gsutil cp

pg_dump -d $DB -b --format=t \
    | split -b 50G - \
    | gsutil cp - gs://$BUCKET/$BACKUP

Would something like that work?这样的东西会起作用吗?

You generally don't want to upload a single object in the multi-terabyte range with a streaming transfer.您通常不希望通过流传输上传多 TB 范围内的单个 object。 Streaming transfers have two major downsides, and they're both very bad news for you:流传输有两个主要缺点,它们对您来说都是非常坏的消息:

  1. Streaming Transfers don't use Cloud Storage's checksum support.流传输不使用 Cloud Storage 的校验和支持。 You'll get regular HTTP data integrity checking, but that's it, and for periodic 5 TB uploads, there's a nonzero chance that this could eventually end up in a corrupt backup.您将获得常规的 HTTP 数据完整性检查,但仅此而已,并且对于定期 5 TB 上传,这最终可能会导致备份损坏。
  2. Streaming Transfers can't be resumed if they fail.如果流传输失败,则无法恢复流传输。 Assuming you're uploading at 100 Mbps around the clock, a 5 TB upload would take at least 4 and a half days, and if your HTTP connection failed, you'd need to start over from scratch.假设您全天候以 100 Mbps 的速度上传,5 TB 的上传至少需要 4 天半的时间,如果您的 HTTP 连接失败,您需要从头开始。

Instead, here's what I would suggest:相反,这是我的建议:

  1. First, minimize the file size.首先,最小化文件大小。 pg_dump has a number of options for reducing the file size. pg_dump 有许多减小文件大小的选项。 It's possible something like "--format=c -Z9" might produce a much smaller file. “--format=c -Z9”之类的东西可能会产生一个小得多的文件。
  2. Second, if possible, store the dump as a file (or, preferably, a series of split up files) before uploading.其次,如果可能,在上传之前将转储存储为一个文件(或者,最好是一系列拆分文件)。 This is good because you'll be able to calculate their checksums, which gsutil can take advantage of, and also you'd be able to manually verify that they uploaded correctly if you wanted.这很好,因为您将能够计算它们的校验和,gsutil 可以利用这些校验和,并且如果需要,您还可以手动验证它们是否正确上传。 Of course, this may not be practical because you'll need a spare 5TB of hard drive space, but unless your database won't be changing for a few days, there may not be an easy alternative to retry in case you lose your connection.当然,这可能不切实际,因为您需要 5TB 的备用硬盘空间,但除非您的数据库在几天内不会更改,否则可能没有简单的替代方法可以重试以防您失去连接.

As mentioned by Ferregina Pelona, guillaume blaquiere and John Hanley.正如 Ferregina Pelona、guillaume blaquiere 和 John Hanley 所提到的。 There is no way to bypass the 5-TB limit implemented by Google, as mentioned in this document :如本文档所述,无法绕过 Google 实施的 5 TB 限制:

Cloud Storage 5TB object size limit云存储 5TB object 大小限制

Cloud Storage supports a maximum single-object size up to 5 terabytes. Cloud Storage 支持最大为 5 TB 的单个对象大小。 If you have objects larger than 5TB, the object transfer fails for those objects for either Cloud Storage or Transfer for on-premises.如果您有大于 5TB 的对象,则 object 传输对于云存储或本地传输的这些对象将失败。

If the file surpasses the limit (5 TB), the transfer fails.如果文件超过限制 (5 TB),则传输失败。

You can use Google's issue tracker to request this feature, within the link provided, you can check the features that were requested or request a feature that satisfies your expectations.您可以使用 Google 的问题跟踪器请求此功能,在提供的链接中,您可以检查请求的功能或请求满足您期望的功能。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将文件上传到 Python 3 上的 Google Cloud Storage? - How to upload a file to Google Cloud Storage on Python 3? 如何 stream 将 CSV 数据上传到谷歌云存储(Python) - How to stream upload CSV data to Google Cloud Storage (Python) 如何使用 Python API 在 Google Cloud Storage 上上传文件夹 - How to upload folder on Google Cloud Storage using Python API 如何将 dataframe 上传到 Python 3 上的 Google Cloud Storage(bucket)? - How to upload a dataframe to Google Cloud Storage(bucket) on Python 3? 如何将 JSON 文件上传到 Google Cloud Storage? - How can I upload JSON file to Google Cloud Storage? 从字节上传谷歌云存储失败 - google cloud storage upload from bytes fails 如何在没有 rclone 的情况下以编程方式直接从 Digital Ocean Storage 上传到 Google Cloud Storage - How to upload from Digital Ocean Storage to Google Cloud Storage, directly, programatically without rclone 如何从反应连接到谷歌云存储并获得私有 object - How to connect from react to google cloud storage and get private object 如何使用 jQuery ajax 访问谷歌云存储 object - How to access google cloud storage object with jQuery ajax 谷歌云存储权限仅对某些 object - Google cloud storage permission only on some object
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM