简体   繁体   English

每个文件夹的最大文件数 Google Cloud Storage

[英]Maximum number if files per folder Google Cloud Storage

I want to setup Google Cloud Storage as my data lake and I'm using Pubsub + dataflow to save interactions into it.我想将 Google Cloud Storage 设置为我的数据湖,并且我正在使用 Pubsub + 数据流将交互保存到其中。 Dataflow creates a new file each 5 min to store it in a GCS folder. Dataflow 每 5 分钟创建一个新文件,以将其存储在 GCS 文件夹中。 This will eventually lead to a lot of files inside the given folder.这最终会导致给定文件夹中有很多文件。 Is there any limit on the number of files that can be saved inside a GCS folder? GCS 文件夹中可以保存的文件数量是否有限制?

There is no practical limit.没有实际限制。 Bear in mind there are not even really "folders" in Cloud Storage.请记住,云存储中甚至没有真正的“文件夹”。 There are just objects with paths whose names look like folders, for the purpose of helping you organize and navigate all that content.只有名称看起来像文件夹的路径的对象,目的是帮助您组织和导航所有内容。

The limit is 5.2 pentillion, which would take many years to even create限制是 5.2 pentillion,这甚至需要很多年才能创建

We store some of our services as zero-compute JSON files with sub-folders in GCP buckets.我们将一些服务存储为零计算 JSON 文件,并在 GCP 存储桶中包含子文件夹。 I wanted to confirm we could store more than 4.2 billion folders in a bucket so we could access our files via ID just like we would in a database (currently we are up to over 100k files per folder - we basically use GCP buckets as a type of database that has a read:write ratio well-beyond 1m:1).我想确认我们可以在一个存储桶中存储超过 42 亿个文件夹,这样我们就可以像在数据库中一样通过 ID 访问我们的文件(目前每个文件夹最多有超过 10 万个文件 - 我们基本上使用 GCP 存储桶作为一种类型具有远超过 1m:1 的读:写比率的数据库)。

I asked our engineering team to open a ticket and confirm our usage was practical, and that passing 4.2 billion items was possible.我要求我们的工程团队开票并确认我们的使用是实用的,并且可以通过 42 亿个项目。 Google Cloud support confirmed there are customers using Cloud Storage today that go well-beyond the 4.2 billion (32 bit) limit, into the trillions, and that the main index currently involves a 64 bit pointer, which may be the only limit.谷歌云支持证实今天有客户使用云存储,go 远远超过 42 亿(32 位)的限制,达到数万亿,并且主索引目前涉及 64 位指针,这可能是唯一的限制。

64 bit is 5.2 pentillion, or 9,223,372,036,854,775,807 to be exact. 64 位是 5.2 pentillion,或者准确地说是 9,223,372,036,854,775,807。

They do have other, related-limits like 1k writes/5k reads per second per bucket, which can auto-scale but has nuances, so if you think you may hit that limit, you may want to read about it here: https://cloud.google.com/storage/docs/request-rate .他们确实有其他相关限制,例如每个存储桶每秒 1k 写入/5k 读取,它可以自动扩展但有细微差别,所以如果您认为您可能会达到该限制,您可能需要在此处阅读: https:/ /cloud.google.com/storage/docs/request-rate

For reference, here is there general storage quotas and limits: https://cloud.google.com/storage/quotas作为参考,这里有一般的存储配额和限制: https://cloud.google.com/storage/quotas

...it does not describe the 64-bit / 5.2 pentillion item limitation, possibly because that limit would practically be impossible to reach, as it would take about a decade just to create the objects, after which time it would be 2032 and they would probably have engineered beyond 64-bit:) ...它没有描述 64 位 / 5.2 pentillion 项目限制,可能是因为该限制实际上是不可能达到的,因为创建对象需要大约十年,之后将是 2032 年,他们可能会设计超过 64 位:)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM