简体   繁体   English

如何使用 python aioboto3 或 boto3 仅从 S3 获取文件?

[英]How can I get ONLY files from S3 with python aioboto3 or boto3?

I have this code and I want only paths that end to a file without intermediate empty folders.我有这个代码,我只想要以文件结尾的路径,而没有中间的空文件夹。 For example:例如:

data/folder1/folder2
data/folder1/folder3/folder4/file1.txt
data/folder5/file2.txt

From those paths I only want:从这些路径我只想要:

data/folder1/folder3/folder4/file1.txt
data/folder5/file2.txt

I am using this code but it gives me paths that end to directories as well:我正在使用此代码,但它也为我提供了以目录结尾的路径:

    subfolders = set()
    current_path = None

    result = await self.s3_client.list_objects(Bucket=bucket, Prefix=prefix)
    objects = result.get("Contents")

    try:
        for obj in objects:
            current_path = os.path.dirname(obj["Key"])
            if current_path not in subfolders:
                subfolders.add(current_path)
    except Exception as exc:
        print(f"Getting objects with prefix: {prefix} failed")
        raise exc

Cant you check whether there is an extension or not?你不能检查是否有扩展? By the way, you dont need to check existence of the path in the set since set will always keep the unique items.顺便说一句,您不需要检查集合中路径的存在,因为集合将始终保留唯一项目。

list_objects does not return any indicator whether the item is folder or file. list_objects不返回任何指示项是文件夹还是文件。 So, this looks the practical way.所以,这看起来很实用。

Please check: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.list_objects请检查: https : //boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.list_objects

subfolders = set()
current_path = None

result = await self.s3_client.list_objects(Bucket=bucket, Prefix=prefix)
objects = result.get("Contents")

try:
    for obj in objects:
        current_path = os.path.dirname(obj["Key"])
        if "." in current_path:
            subfolders.add(current_path)
except Exception as exc:
    print(f"Getting objects with prefix: {prefix} failed")
    raise exc

I would recommend using the boto3 Bucket resource here, because it simplifies pagination.我建议在这里使用 boto3 Bucket资源,因为它简化了分页。

Here is an example of how to get a list of all files in an S3 bucket:以下是如何获取 S3 存储桶中所有文件列表的示例:

import boto3

bucket = boto3.resource("s3").Bucket("mybucket")
objects = bucket.objects.all()

files = [obj.key for obj in objects if not obj.key.endswith("/")]
print("Files:", files)

It's worth noting that getting a list of all folders and subfolders in an S3 bucket is a more difficult problem to solve, mainly because folders don't typically exist in S3.值得注意的是,获取 S3 存储桶中所有文件夹和子文件夹的列表是一个更难解决的问题,主要是因为文件夹通常不存在于 S3 中。 They are logically present, but not physically present, because of the presence of objects with a given hierarchical key such as dogs/small/corgi.png .它们在逻辑上存在,但在物理上不存在,因为存在具有给定分层键的对象,例如dogs/small/corgi.png For ideas, see retrieving subfolder names in S3 bucket .有关想法,请参阅检索 S3 存储桶中的子文件夹名称

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 aioboto3 快速异步地从亚马逊 S3 中仅获取底层子文件夹 - How to get ONLY bottom level sub-folders from amazon S3 with aioboto3 fast and asynchronously 如何使用 boto3 仅使用 python 将更改的文件从一个 S3 存储桶复制到另一个存储桶? - How can I copy only changed files from one S3 Bucket to another one with python using boto3? 如何使用 aioboto3 同时列出 S3 存储桶中的对象 - How to concurrently list_objects in S3 bucket with aioboto3 如何使用 python boto3 将文件和文件夹从一个 S3 存储桶复制到另一个 S3 - how to copy files and folders from one S3 bucket to another S3 using python boto3 如何使用 Python Boto3 获取 S3 存储桶的总数 - How to get a total count of S3 Buckets with Python Boto3 如何使用python boto3获取所有子目录,除AWS S3中的文件外所有级别的深度 - How to get ALL subdirectories, all levels deep except files in AWS S3 with python boto3 如何使用 Python boto3 获取 s3 存储桶中超过 60 天的文件/对象的计数? - How to use Python boto3 to get count of files/object in s3 bucket older than 60 days? 使用Boto3从S3进行多部分获取 - Multipart get from s3 with boto3 如何在不使用boto3的情况下从python程序访问存储在s3存储桶中的文件? - How to access files stored in s3 bucket from python program without use of boto3? 如何使用 boto3 将 Github 上的文件上传到 AWS S3 存储桶? - How can I upload the files on Github to AWS S3 bucket using boto3?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM