Using the boto3 library and python code below, I can iterate through S3 buckets and prefixes, printing out the prefix name and key name as follows:
import boto3
client = boto3.client('s3')
pfx_paginator = client.get_paginator('list_objects_v2')
pfx_iterator = pfx_paginator.paginate(Bucket='app_folders', Delimiter='/')
for prefix in pfx_iterator.search('CommonPrefixes'):
print(prefix['Prefix'])
key_paginator = client.get_paginator('list_objects_v2')
key_iterator = key_paginator.paginate(Bucket='app_folders', Prefix=prefix['Prefix'])
for key in key_iterator.search('Contents'):
print(key['Key'])
Inside the key loop, I can put in a counter to count the number of keys (files), but this is an expensive operation. Is there a way to make one call given a bucket name and a prefix and return the count of keys contained in that prefix (even if it is more than 1000)?
UPDATE: I found a post here that shows a way to do this with the AWS CLI as follows:
aws s3api list-objects --bucket BUCKETNAME --prefix "folder/subfolder/" --output json --query "[length(Contents[])]"
Is there a way to do something similar with the boto3 API?
You can do it using MaxKeys=1000
parameter. For your case:
pfx_iterator = pfx_paginator.paginate(Bucket='app_folders', Delimiter='/', MaxKeys=1000)
In general:
response = client.list_objects_v2(
Bucket='string',
Delimiter='string',
EncodingType='url',
MaxKeys=123,
Prefix='string',
ContinuationToken='string',
FetchOwner=True|False,
StartAfter='string',
RequestPayer='requester'
)
It will be cheaper for you in 1000 times :) Documentation here
Using aws cli it is easy to count :
aws s3 ls <folder url> --recursive --summarize | grep <comment>
eg,
aws s3 ls s3://abc/ --recursive --summarize | grep "Number of Objects"
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.