简体   繁体   English

如何获得boto3 Collection的大小?

[英]How do I get the size of a boto3 Collection?

The way I have been using is to transform the Collection into a List and query the length: 我一直在使用的方法是将Collection转换为List并查询长度:

s3 = boto3.resource('s3')
bucket = s3.Bucket('my_bucket')
size = len(list(bucket.objects.all()))

However, this forces resolution of the whole collection and obviates the benefits of using a Collection in the first place. 然而,这迫使整个系列的解决方案并且首先避免了使用Collection的好处。 Is there a better way to do this? 有一个更好的方法吗?

There is no way to get the count of keys in a bucket without listing all the objects this is a limitation of AWS S3 (see https://forums.aws.amazon.com/thread.jspa?messageID=164220 ). 如果没有列出所有对象,则无法获取存储桶中的密钥数量这是AWS S3的限制(请参阅https://forums.aws.amazon.com/thread.jspa?messageID=164220 )。

Getting the Object Summaries (HEAD) doesn't get the actual data so should be a relatively inexpensive operation and if you are just discarding the list then you could do: 获取对象摘要(HEAD)不会获得实际数据,因此应该是相对便宜的操作,如果您只是丢弃列表,那么您可以执行以下操作:

size = sum(1 for _ in bucket.objects.all())

Which will give you the number of objects without constructing a list. 这将为您提供没有构建列表的对象数量。

Borrowing from a similar question , one option to retrieve the complete list of object keys from a bucket + prefix is to use recursion with the list_objects_v2 method. 借用类似的问题 ,从bucket +前缀检索对象键的完整列表的一个选项是使用list_objects_v2方法的递归。

This method will recursively retrieve the list of object keys, 1000 keys at a time. 此方法将以递归方式一次检索对象键列表,1000个键。

Each request to list_objects_v2 uses the StartAfter argument to continue listing keys after the last key from the previous request. list_objects_v2每个请求list_objects_v2使用StartAfter参数继续列出上一个请求中最后一个键之后的键。

import boto3

if __name__ == '__main__':

    client = boto3.client('s3',
        aws_access_key_id     = 'access_key',
        aws_secret_access_key = 'secret_key'
    )

    def get_all_object_keys(bucket, prefix, start_after = '', keys = []):
        response = client.list_objects_v2(
            Bucket     = bucket,
            Prefix     = prefix,
            StartAfter = start_after
        )

        if 'Contents' not in response:
            return keys

        key_list = response['Contents']
        last_key = key_list[-1]['Key']

        keys.extend(key_list)

        return get_all_object_keys(bucket, prefix, last_key, keys)

    object_keys = get_all_object_keys('your_bucket', 'prefix/to/files')

    print(len(object_keys))

For my use case, I just needed to know whether the folder is empty or not. 对于我的用例,我只需要知道该文件夹是否为空。

s3 = boto3.client('s3')
response = s3.list_objects(
        Bucket='your-bucket',
        Prefix='path/to/your/folder/',
)
print(len(response['Contents']))

This was enough to know whether the folder is empty. 这足以知道文件夹是否为空。 Note that a folder, if manually created in the S3 console, can count as a resource itself. 请注意,如果在S3控制台中手动创建文件夹,则可以将其视为资源本身。 In this case, if the length shown above is greater than 1, then the S3 "folder" is empty. 在这种情况下,如果上面显示的长度大于1,则S3“文件夹”为空。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM