简体   繁体   中英

How to find size of a folder inside an S3 bucket?

I am using boto3 module in python to interact with S3 and currently I'm able to get the size of every individual key in an S3 bucket. But my motive is to find the space storage of only the top level folders (every folder is a different project) and we need to charge per project for the space used. I'm able to get the names of the top level folders but not getting any details about the size of the folders in the below implementation. The following is my implementation to get the top level folder names.

import boto
import boto.s3.connection

AWS_ACCESS_KEY_ID = "access_id"
AWS_SECRET_ACCESS_KEY = "secret_access_key"
Bucketname = 'Bucket-name' 

conn = boto.s3.connect_to_region('ap-south-1',
   aws_access_key_id=AWS_ACCESS_KEY_ID,
   aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
   is_secure=True, # uncomment if you are not using ssl
   calling_format = boto.s3.connection.OrdinaryCallingFormat(),
   )

bucket = conn.get_bucket('bucket')
folders = bucket.list("", "/")

for folder in folders:
    print(folder.name)

The type of folder here is boto.s3.prefix.Prefix and it doesn't display any details of size. Is there any way to search a folder/object in an S3 bucket by it's name and then fetch the size of that object ?

In order to get the size of an S3 folder, objects (accessible in the boto3.resource('s3').Bucket) provide the method filter(Prefix) that allows you to retrieve ONLY the files which respect the Prefix condition, and makes it quite optimised.

import boto3

def get_size(bucket, path):
    s3 = boto3.resource('s3')
    my_bucket = s3.Bucket(bucket)
    total_size = 0

    for obj in my_bucket.objects.filter(Prefix=path):
        total_size = total_size + obj.size

    return total_size

So let's say you want to get the size of the folder s3://my-bucket/my/path/ then you would call the previous function like that:

get_size("my-bucket", "my/path/")

Then this of course is easily applicable to top level folders as well

To find the size of the top-level "folders" in S3 (S3 does not really have a concept of folders, but kind of displays a folder structure in the UI), something like this will work:

from boto3 import client
conn = client('s3')

top_level_folders = dict()

for key in conn.list_objects(Bucket='kitsune-buildtest-production')['Contents']:

    folder = key['Key'].split('/')[0]
    print("Key %s in folder %s. %d bytes" % (key['Key'], folder, key['Size']))

    if folder in top_level_folders:
        top_level_folders[folder] += key['Size']
    else:
        top_level_folders[folder] = key['Size']


for folder, size in top_level_folders.items():
    print("Folder: %s, size: %d" % (folder, size))

Not using boto3, just aws cli, but this quick one-liner serves the purpose. I usually put a tail -1 to get the summary folder size only. Can be a bit slow though, for folders having many objects.

aws s3 ls --summarize --human-readable --recursive s3://bucket-name/folder-name | tail -1

To get more than 1000 objects from S3 by using list_objects_v2, try this

from boto3 import client
conn = client('s3')

top_level_folders = dict()

paginator = conn.get_paginator('list_objects_v2')
pages = paginator.paginate(Bucket='bucket', Prefix='prefix')
index = 1
for page in pages:
    for key in page['Contents']:
        print(key['Size'])
        folder = key['Key'].split('/')[index]
        print("Key %s in folder %s. %d bytes" % (key['Key'], folder, key['Size']))

        if folder in top_level_folders:
            top_level_folders[folder] += key['Size']
        else:
            top_level_folders[folder] = key['Size']

for folder, size in top_level_folders.items():
    size_in_gb = size/(1024*1024*1024)
    print("Folder: %s, size: %.2f GB" % (folder, size_in_gb))

if the prefix is notes/ and the delimiter is a slash (/) as in notes/summer/july, the common prefix is notes/summer/. Incase prefix is "notes/" : index = 1 or "notes/summer/" : index = 2

def find_size(name, conn):
  for bucket in conn.get_all_buckets():
    if name == bucket.name:
      total_bytes = 0
      for key in bucket:
        total_bytes += key.size
        total_bytes = total_bytes/1024/1024/1024
      print total_bytes 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM