Counting keys in an S3 bucket

Question

Using the boto3 library and python code below, I can iterate through S3 buckets and prefixes, printing out the prefix name and key name as follows:

import boto3
client = boto3.client('s3')

pfx_paginator = client.get_paginator('list_objects_v2')
pfx_iterator = pfx_paginator.paginate(Bucket='app_folders', Delimiter='/')
for prefix in pfx_iterator.search('CommonPrefixes'):
    print(prefix['Prefix'])

    key_paginator = client.get_paginator('list_objects_v2')
    key_iterator = key_paginator.paginate(Bucket='app_folders', Prefix=prefix['Prefix'])
    for key in key_iterator.search('Contents'):
        print(key['Key'])

Inside the key loop, I can put in a counter to count the number of keys (files), but this is an expensive operation. Is there a way to make one call given a bucket name and a prefix and return the count of keys contained in that prefix (even if it is more than 1000)?

UPDATE: I found a post here that shows a way to do this with the AWS CLI as follows:

aws s3api list-objects --bucket BUCKETNAME --prefix "folder/subfolder/" --output json --query "[length(Contents[])]"

Is there a way to do something similar with the boto3 API?

Answer 1

You can do it using MaxKeys=1000 parameter. For your case:

pfx_iterator = pfx_paginator.paginate(Bucket='app_folders', Delimiter='/', MaxKeys=1000)

In general:

response = client.list_objects_v2(
    Bucket='string',
    Delimiter='string',
    EncodingType='url',
    MaxKeys=123,
    Prefix='string',
    ContinuationToken='string',
    FetchOwner=True|False,
    StartAfter='string',
    RequestPayer='requester'
)

It will be cheaper for you in 1000 times :) Documentation here

Answer 2

Using aws cli it is easy to count :

aws s3 ls  <folder url> --recursive --summarize | grep <comment>

eg,

aws s3 ls  s3://abc/ --recursive --summarize | grep "Number of Objects"

Counting keys in an S3 bucket

Question

2 answers

solution1
0 2019-12-06 20:25:12

solution2
0 2021-01-20 09:13:39

Counting keys in an S3 bucket

Question

2 answers

solution1 0 2019-12-06 20:25:12

solution2 0 2021-01-20 09:13:39

solution1
0 2019-12-06 20:25:12

solution2
0 2021-01-20 09:13:39