简体   繁体   English

快速删除 GCS 存储桶上的大文件夹的方法

[英]Fast way to delete big folder on GCS bucket

I have a very big GCS bucket (several TB), with several sub directories, each with a couple terabytes of data.我有一个非常大的 GCS 存储桶(几 TB),有几个子目录,每个目录都有几 TB 的数据。

I want to delete some of those folders.我想删除其中一些文件夹。

I tried to use gsutil from a Cloud Shell, but it is taking ages.我尝试使用 Cloud Shell 中的gsutil ,但这需要很长时间。

For reference, here is the command I'm using:作为参考,这是我正在使用的命令:

gsutil -m rm -r "gs://BUCKET_NAME/FOLDER"

I was looking at this question , and thought maybe I could use that, but is seems like it can't filter by folder name, and I can't filter by any other thing as folders have some mixed content.我在看这个问题,并想也许我可以使用它,但似乎它不能按文件夹名称过滤,而且我不能按任何其他东西过滤,因为文件夹有一些混合内容。

So far, my last resort would be to wait until the folders I want to delete are "old", and set the lifecycle rule accordingly, but that could take too long.到目前为止,我最后的手段是等到我要删除的文件夹“旧”,并相应地设置生命周期规则,但这可能需要太长时间。

Are there any other ways to make this faster?还有其他方法可以加快速度吗?

It's just going to take a long time;只是需要很长时间; you have to issue a DELETE request for each object with the prefix FOLDER/ .您必须为每个带有前缀FOLDER/对象发出 DELETE 请求。

GCS doesn't have the concept of "folders". GCS 没有“文件夹”的概念。 Object names can share a common prefix, but they're all in a flat namespace.对象名称可以共享一个公共前缀,但它们都在一个平面命名空间中。 For example, if you have these three objects:例如,如果您有这三个对象:

  • /a/b/c/1.txt /a/b/c/1.txt
  • /a/b/c/2.txt /a/b/c/2.txt
  • /a/b/c/3.txt /a/b/c/3.txt

...then you don't actually have folders named a , b , or c . ...那么您实际上没有名为abc文件夹。 Once you deleted those three objects, the "folders" (ie the prefix that they shared) would no longer appear when you listed objects in your bucket.删除这三个对象后,当您在存储桶中列出对象时,“文件夹”(即它们共享的前缀)将不再出现。

See the docs for more details:有关更多详细信息,请参阅文档:

https://cloud.google.com/storage/docs/gsutil/addlhelp/HowSubdirectoriesWork https://cloud.google.com/storage/docs/gsutil/addlhelp/HowSubdirectoriesWork

Creating a lifecycle rule with a matchesPrefix as the folder name is the best way to remove large folders in a bucket.创建以 matchesPrefix 作为文件夹名称的生命周期规则是删除存储桶中大型文件夹的最佳方式。 It does take up to 24 hours to have an effect though.不过,它确实需要长达 24 小时才能产生效果。 https://cloud.google.com/storage/docs/lifecycle#matchesprefix-suffix https://cloud.google.com/storage/docs/lifecycle#matchesprefix-suffix

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM