简体   繁体   English

仅从 S3 存储桶获取文件名而不下载文件

[英]Getting only filenames from S3 bucket without downloading files

I have a bucket with 4+ million files (50GB+).我有一个包含 4+ 百万个文件(50GB+)的存储桶。 I'd like to get the list of files (without the data) using Python without downloading the files.我想在不下载文件的情况下使用 Python 获取文件列表(没有数据)。

files = s3_bucket.objects.filter(Prefix='myPrefix')

# print(len(list(files_raw)))
for key in files:
    print(key.last_modified)

I have something like this but I notice there's a lot of data coming through the network.我有这样的事情,但我注意到有很多数据通过网络传来。

I was trying to look at the documentation for ObjectSummary and I was hoping it only downloads the metadata.我试图查看 ObjectSummary 的文档,我希望它只下载元数据。 ObjectSummary and HEAD operation ObjectSummaryHEAD 操作

The HEAD operation retrieves metadata from an object without returning the object itself. HEAD 操作从对象中检索元数据,而不返回对象本身。 This operation is useful if you're only interested in an object's metadata.如果您只对对象的元数据感兴趣,则此操作很有用。 To use HEAD, you must have READ access to the object.要使用 HEAD,您必须对该对象具有 READ 访问权限。

A HEAD request has the same options as a GET operation on an object. HEAD 请求与对象上的 GET 操作具有相同的选项。 The response is identical to the GET response except that there is no response body.除了没有响应正文之外,响应与 GET 响应相同。

Is it still having to download the entire file just to retrieve the filenames?是否仍然需要下载整个文件才能检索文件名?

When using the resource method in boto3, the requests actually get translated into other API calls.在 boto3 中使用资源方法时,请求实际上会转换为其他 API 调用。 However, it's not easy to see what calls happen "behind the scenes".但是,要看到“幕后”发生了什么调用并不容易。 Sometimes one method can translate into multiple calls (eg ListObjects and HeadObject ).有时一种方法可以转换为多次调用(例如ListObjectsHeadObject )。

You might consider using the client method of calls, since they map 1:1 to the API calls on AWS:您可以考虑使用客户端调用方法,因为它们 1:1 映射到 AWS 上的 API 调用:

import boto3

s3_client = boto3.client('s3')

paginator = s3_client.get_paginator('list_objects_v2')

response_iterator = paginator.paginate(Bucket='bucket-name')

for page in response_iterator:
    for object in page['Contents']:
        print(object['Key'], object['LastModified'])

I would also recommend that you look at Amazon S3 Inventory .我还建议您查看Amazon S3 Inventory It can provide a daily CSV file containing a list of all objects and their metadata.它可以提供包含所有对象及其元数据的列表的每日 CSV 文件。 This is very useful for large buckets (such as yours).这对于大型存储桶(例如您的存储桶)非常有用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 无需下载即可读取s3存储桶中文件的内容 - Read contents of files on s3 bucket without downloading 使用 Python 在公共 S3 存储桶上下载文件而不进行身份验证 - Downloading Files on a Public S3 Bucket Without Authentication Using Python 如何使用pdfminer从存储在S3存储桶中的PDF文件中提取文本而不下载到本地? - How to use pdfminer to extract text from PDF files stored in S3 bucket without downloading it locally? 从 boto2 下载 S3 存储桶中的子文件夹文件 - Downloading subfolders files in S3 bucket from boto2 编写代理以从AWS S3存储桶下载文件 - write a proxy for downloading files from AWS S3 bucket 如何从s3存储桶中仅读取5条记录并在不获取csv文件的所有数据的情况下返回它 - How to read only 5 records from s3 bucket and return it without getting all data of csv file 仅包含来自 S3 存储桶的 .gz 扩展名文件 - Include only .gz extension files from S3 bucket 从 S3 读取 ZIP 文件,无需下载整个文件 - Read ZIP files from S3 without downloading the entire file 将文件从Amazon EC2的S3存储桶下载到Windows 7中的本地驱动器 - downloading files from s3 bucket in amazon ec2 to local drive in windows 7 在没有 AWS 访问密钥的情况下从 S3 亚马逊公共存储桶下载数据 - Downloading data from S3 amazon public bucket without having AWS access key
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM