如何使用 python aioboto3 或 boto3 仅从 S3 获取文件？

Question

I have this code and I want only paths that end to a file without intermediate empty folders.我有这个代码，我只想要以文件结尾的路径，而没有中间的空文件夹。 For example:例如：

data/folder1/folder2
data/folder1/folder3/folder4/file1.txt
data/folder5/file2.txt

From those paths I only want:从这些路径我只想要：

data/folder1/folder3/folder4/file1.txt
data/folder5/file2.txt

I am using this code but it gives me paths that end to directories as well:我正在使用此代码，但它也为我提供了以目录结尾的路径：

    subfolders = set()
    current_path = None

    result = await self.s3_client.list_objects(Bucket=bucket, Prefix=prefix)
    objects = result.get("Contents")

    try:
        for obj in objects:
            current_path = os.path.dirname(obj["Key"])
            if current_path not in subfolders:
                subfolders.add(current_path)
    except Exception as exc:
        print(f"Getting objects with prefix: {prefix} failed")
        raise exc

Answer 1

Cant you check whether there is an extension or not?你不能检查是否有扩展？ By the way, you dont need to check existence of the path in the set since set will always keep the unique items.顺便说一句，您不需要检查集合中路径的存在，因为集合将始终保留唯一项目。

list_objects does not return any indicator whether the item is folder or file. list_objects不返回任何指示项是文件夹还是文件。 So, this looks the practical way.所以，这看起来很实用。

Please check: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.list_objects请检查： https : //boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.list_objects

subfolders = set()
current_path = None

result = await self.s3_client.list_objects(Bucket=bucket, Prefix=prefix)
objects = result.get("Contents")

try:
    for obj in objects:
        current_path = os.path.dirname(obj["Key"])
        if "." in current_path:
            subfolders.add(current_path)
except Exception as exc:
    print(f"Getting objects with prefix: {prefix} failed")
    raise exc

Answer 2

I would recommend using the boto3 Bucket resource here, because it simplifies pagination.我建议在这里使用 boto3 Bucket资源，因为它简化了分页。

Here is an example of how to get a list of all files in an S3 bucket:以下是如何获取 S3 存储桶中所有文件列表的示例：

import boto3

bucket = boto3.resource("s3").Bucket("mybucket")
objects = bucket.objects.all()

files = [obj.key for obj in objects if not obj.key.endswith("/")]
print("Files:", files)

It's worth noting that getting a list of all folders and subfolders in an S3 bucket is a more difficult problem to solve, mainly because folders don't typically exist in S3.值得注意的是，获取 S3 存储桶中所有文件夹和子文件夹的列表是一个更难解决的问题，主要是因为文件夹通常不存在于 S3 中。 They are logically present, but not physically present, because of the presence of objects with a given hierarchical key such as dogs/small/corgi.png .它们在逻辑上存在，但在物理上不存在，因为存在具有给定分层键的对象，例如dogs/small/corgi.png 。 For ideas, see retrieving subfolder names in S3 bucket .有关想法，请参阅检索 S3 存储桶中的子文件夹名称。

如何使用 python aioboto3 或 boto3 仅从 S3 获取文件？

问题描述

2 个解决方案

解决方案1
1 2021-10-14 14:06:28

解决方案2
1 已采纳 2021-10-14 14:14:57

如何使用 python aioboto3 或 boto3 仅从 S3 获取文件？

问题描述

2 个解决方案

解决方案1 1 2021-10-14 14:06:28

解决方案2 1 已采纳 2021-10-14 14:14:57

解决方案1
1 2021-10-14 14:06:28

解决方案2
1 已采纳 2021-10-14 14:14:57