[英]Getting only filenames from S3 bucket without downloading files
I have a bucket with 4+ million files (50GB+).我有一个包含 4+ 百万个文件(50GB+)的存储桶。 I'd like to get the list of files (without the data) using Python without downloading the files.
我想在不下载文件的情况下使用 Python 获取文件列表(没有数据)。
files = s3_bucket.objects.filter(Prefix='myPrefix')
# print(len(list(files_raw)))
for key in files:
print(key.last_modified)
I have something like this but I notice there's a lot of data coming through the network.我有这样的事情,但我注意到有很多数据通过网络传来。
I was trying to look at the documentation for ObjectSummary and I was hoping it only downloads the metadata.我试图查看 ObjectSummary 的文档,我希望它只下载元数据。 ObjectSummary and HEAD operation
ObjectSummary和HEAD 操作
The HEAD operation retrieves metadata from an object without returning the object itself.
HEAD 操作从对象中检索元数据,而不返回对象本身。 This operation is useful if you're only interested in an object's metadata.
如果您只对对象的元数据感兴趣,则此操作很有用。 To use HEAD, you must have READ access to the object.
要使用 HEAD,您必须对该对象具有 READ 访问权限。
A HEAD request has the same options as a GET operation on an object.
HEAD 请求与对象上的 GET 操作具有相同的选项。 The response is identical to the GET response except that there is no response body.
除了没有响应正文之外,响应与 GET 响应相同。
Is it still having to download the entire file just to retrieve the filenames?是否仍然需要下载整个文件才能检索文件名?
When using the resource method in boto3, the requests actually get translated into other API calls.在 boto3 中使用资源方法时,请求实际上会转换为其他 API 调用。 However, it's not easy to see what calls happen "behind the scenes".
但是,要看到“幕后”发生了什么调用并不容易。 Sometimes one method can translate into multiple calls (eg
ListObjects
and HeadObject
).有时一种方法可以转换为多次调用(例如
ListObjects
和HeadObject
)。
You might consider using the client method of calls, since they map 1:1 to the API calls on AWS:您可以考虑使用客户端调用方法,因为它们 1:1 映射到 AWS 上的 API 调用:
import boto3
s3_client = boto3.client('s3')
paginator = s3_client.get_paginator('list_objects_v2')
response_iterator = paginator.paginate(Bucket='bucket-name')
for page in response_iterator:
for object in page['Contents']:
print(object['Key'], object['LastModified'])
I would also recommend that you look at Amazon S3 Inventory .我还建议您查看Amazon S3 Inventory 。 It can provide a daily CSV file containing a list of all objects and their metadata.
它可以提供包含所有对象及其元数据的列表的每日 CSV 文件。 This is very useful for large buckets (such as yours).
这对于大型存储桶(例如您的存储桶)非常有用。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.