[英]Not all folders returned by boto3 Bucket.objects.all()
My S3 bucket contains a bunch of files in a multilevel folder structure.我的 S3 存储桶包含多级文件夹结构中的一堆文件。 I'm trying to identify the top level folders in the hierarchy, but
objects.all()
returns some but not all folders as distinct ObjectSummary
objects.我试图识别层次结构中的顶级文件夹,但
objects.all()
将一些但不是所有文件夹作为不同的ObjectSummary
对象返回。 Why?为什么?
Sample file structure:示例文件结构:
file1.txt
a/file2.txt
a/a1/file3.txt
b/b1/file4.txt
Desired output: [a,b]
期望的输出:
[a,b]
What I'm doing:我在做什么:
boto3.resource('s3').Bucket('mybucket').objects.all()
This returns the following ObjectSummary
objects:这将返回以下
ObjectSummary
对象:
file1.txt
a/
a/file2.txt
a/a1/file3.txt
b/b1/file4.txt
Notice that a/
is listed as a separate entry, but b/
is not , while the files in b/
are.请注意,
a/
被列为单独的条目,但b/
不是,而b/
中的文件是。
I could understand it returning neither, as folders are technically not distinct entities, or both, but why are some folders returned and others not?我可以理解它既不返回,因为文件夹在技术上不是不同的实体,或两者兼而有之,但为什么有些文件夹返回而其他文件夹不返回?
I also understand there could be other ways to achieve my objective, but I want to understand why boto3 is behaving this way.我也知道可能有其他方法可以实现我的目标,但我想了解为什么 boto3 会这样。
I could understand it returning neither, as folders are technically not distinct entities, or both, but why are some folders returned and others not?
我可以理解它既不返回,因为文件夹在技术上不是不同的实体,或两者兼而有之,但为什么有些文件夹返回而其他文件夹不返回?
There are no folders in S3. S3 中没有文件夹。 A concept of a folder does not exist in object storage which is S3.
S3的对象存储中不存在文件夹的概念。 What you call a "folder" is just a visual representation of an object with the key
a/
or b/
.您所说的“文件夹”只是带有键
a/
或b/
的对象的视觉表示。 Basically AWS console artificially calls everything with /
a folder leading to all this confusion.基本上,AWS 控制台人为地使用
/
一个文件夹来调用所有内容,从而导致所有这些混乱。
So a/
is just an object (not folder) called a/
.所以
a/
只是一个名为a/
的对象(不是文件夹)。 You don't have /b
"folder", because there is no object called precisely /b
.您没有
/b
“文件夹”,因为没有精确称为/b
的对象。 Instead you have an object which is called b/b1/file4.txt
(not b/
).相反,您有一个名为
b/b1/file4.txt
(不是b/
)的对象。
I just got it.我刚得到它。 S3 does have the concept of creating a folder, through a Create Folder button, which creates a dedicated object with just the folder name, separate from the files that have this as a prefix.
S3 确实具有通过创建文件夹按钮创建文件夹的概念,该按钮创建一个仅具有文件夹名称的专用对象,与具有此前缀的文件分开。
a/
in the example above was a folder I created manually, but I hadn't done this for b/
.上面示例中的
a/
是我手动创建的文件夹,但我没有为b/
执行此操作。
To identify "top-level folders", you can use:要识别“顶级文件夹”,您可以使用:
import boto3
s3_client = boto3.client('s3')
response = s3_client.list_objects_v2(Bucket='BUCKET-NAME',Delimiter='/')
prefix_list = [dict['Prefix'] for dict in response['CommonPrefixes']]
print(prefix_list)
By specifying Delimiter='/'
it returns a list of CommonPrefixes
that are effectively the folder names.通过指定
Delimiter='/'
它返回一个有效的文件夹名称的CommonPrefixes
列表。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.