[英]How can I get ONLY files from S3 with python aioboto3 or boto3?
I have this code and I want only paths that end to a file without intermediate empty folders.我有这个代码,我只想要以文件结尾的路径,而没有中间的空文件夹。 For example:
例如:
data/folder1/folder2
data/folder1/folder3/folder4/file1.txt
data/folder5/file2.txt
From those paths I only want:从这些路径我只想要:
data/folder1/folder3/folder4/file1.txt
data/folder5/file2.txt
I am using this code but it gives me paths that end to directories as well:我正在使用此代码,但它也为我提供了以目录结尾的路径:
subfolders = set()
current_path = None
result = await self.s3_client.list_objects(Bucket=bucket, Prefix=prefix)
objects = result.get("Contents")
try:
for obj in objects:
current_path = os.path.dirname(obj["Key"])
if current_path not in subfolders:
subfolders.add(current_path)
except Exception as exc:
print(f"Getting objects with prefix: {prefix} failed")
raise exc
Cant you check whether there is an extension or not?你不能检查是否有扩展? By the way, you dont need to check existence of the path in the set since set will always keep the unique items.
顺便说一句,您不需要检查集合中路径的存在,因为集合将始终保留唯一项目。
list_objects
does not return any indicator whether the item is folder or file. list_objects
不返回任何指示项是文件夹还是文件。 So, this looks the practical way.所以,这看起来很实用。
Please check: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.list_objects请检查: https : //boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.list_objects
subfolders = set()
current_path = None
result = await self.s3_client.list_objects(Bucket=bucket, Prefix=prefix)
objects = result.get("Contents")
try:
for obj in objects:
current_path = os.path.dirname(obj["Key"])
if "." in current_path:
subfolders.add(current_path)
except Exception as exc:
print(f"Getting objects with prefix: {prefix} failed")
raise exc
I would recommend using the boto3 Bucket resource here, because it simplifies pagination.我建议在这里使用 boto3 Bucket资源,因为它简化了分页。
Here is an example of how to get a list of all files in an S3 bucket:以下是如何获取 S3 存储桶中所有文件列表的示例:
import boto3
bucket = boto3.resource("s3").Bucket("mybucket")
objects = bucket.objects.all()
files = [obj.key for obj in objects if not obj.key.endswith("/")]
print("Files:", files)
It's worth noting that getting a list of all folders and subfolders in an S3 bucket is a more difficult problem to solve, mainly because folders don't typically exist in S3.值得注意的是,获取 S3 存储桶中所有文件夹和子文件夹的列表是一个更难解决的问题,主要是因为文件夹通常不存在于 S3 中。 They are logically present, but not physically present, because of the presence of objects with a given hierarchical key such as
dogs/small/corgi.png
.它们在逻辑上存在,但在物理上不存在,因为存在具有给定分层键的对象,例如
dogs/small/corgi.png
。 For ideas, see retrieving subfolder names in S3 bucket .有关想法,请参阅检索 S3 存储桶中的子文件夹名称。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.