简体   繁体   English

并非 boto3 Bucket.objects.all() 返回的所有文件夹

[英]Not all folders returned by boto3 Bucket.objects.all()

My S3 bucket contains a bunch of files in a multilevel folder structure.我的 S3 存储桶包含多级文件夹结构中的一堆文件。 I'm trying to identify the top level folders in the hierarchy, but objects.all() returns some but not all folders as distinct ObjectSummary objects.我试图识别层次结构中的顶级文件夹,但objects.all()将一些但不是所有文件夹作为不同的ObjectSummary对象返回。 Why?为什么?

Sample file structure:示例文件结构:

file1.txt
a/file2.txt
a/a1/file3.txt
b/b1/file4.txt

Desired output: [a,b]期望的输出: [a,b]

What I'm doing:我在做什么:

boto3.resource('s3').Bucket('mybucket').objects.all()

This returns the following ObjectSummary objects:这将返回以下ObjectSummary对象:

file1.txt
a/
a/file2.txt
a/a1/file3.txt
b/b1/file4.txt

Notice that a/ is listed as a separate entry, but b/ is not , while the files in b/ are.请注意, a/被列为单独的条目,b/不是,而b/中的文件是。

I could understand it returning neither, as folders are technically not distinct entities, or both, but why are some folders returned and others not?我可以理解它既不返回,因为文件夹在技术上不是不同的实体,或两者兼而有之,但为什么有些文件夹返回而其他文件夹不返回?

I also understand there could be other ways to achieve my objective, but I want to understand why boto3 is behaving this way.我也知道可能有其他方法可以实现我的目标,但我想了解为什么 boto3 会这样。

I could understand it returning neither, as folders are technically not distinct entities, or both, but why are some folders returned and others not?我可以理解它既不返回,因为文件夹在技术上不是不同的实体,或两者兼而有之,但为什么有些文件夹返回而其他文件夹不返回?

There are no folders in S3. S3 中没有文件夹 A concept of a folder does not exist in object storage which is S3. S3的对象存储中不存在文件夹的概念。 What you call a "folder" is just a visual representation of an object with the key a/ or b/ .您所说的“文件夹”只是带有键a/b/的对象的视觉表示。 Basically AWS console artificially calls everything with / a folder leading to all this confusion.基本上,AWS 控制台人为地使用/一个文件夹来调用所有内容,从而导致所有这些混乱。

So a/ is just an object (not folder) called a/ .所以a/只是一个名为a/的对象(不是文件夹)。 You don't have /b "folder", because there is no object called precisely /b .您没有/b “文件夹”,因为没有精确称为/b的对象。 Instead you have an object which is called b/b1/file4.txt (not b/ ).相反,您有一个名为b/b1/file4.txt (不是b/ )的对象。

I just got it.我刚得到它。 S3 does have the concept of creating a folder, through a Create Folder button, which creates a dedicated object with just the folder name, separate from the files that have this as a prefix. S3 确实具有通过创建文件夹按钮创建文件夹的概念,该按钮创建一个仅具有文件夹名称的专用对象,与具有此前缀的文件分开。

a/ in the example above was a folder I created manually, but I hadn't done this for b/ .上面示例中的a/是我手动创建的文件夹,但我没有为b/执行此操作。

To identify "top-level folders", you can use:要识别“顶级文件夹”,您可以使用:

import boto3

s3_client = boto3.client('s3')

response = s3_client.list_objects_v2(Bucket='BUCKET-NAME',Delimiter='/')
prefix_list = [dict['Prefix'] for dict in response['CommonPrefixes']]
print(prefix_list)

By specifying Delimiter='/' it returns a list of CommonPrefixes that are effectively the folder names.通过指定Delimiter='/'它返回一个有效的文件夹名称的CommonPrefixes列表。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM