简体   繁体   English

如何查找 S3 存储桶内文件夹的大小?

[英]How to find size of a folder inside an S3 bucket?

I am using boto3 module in python to interact with S3 and currently I'm able to get the size of every individual key in an S3 bucket.我在 python 中使用 boto3 模块与 S3 交互,目前我能够获得 S3 存储桶中每个单独密钥的大小。 But my motive is to find the space storage of only the top level folders (every folder is a different project) and we need to charge per project for the space used.但是我的动机是只找到顶级文件夹的空间存储(每个文件夹都是不同的项目),我们需要按项目对使用的空间进行收费。 I'm able to get the names of the top level folders but not getting any details about the size of the folders in the below implementation.我能够获得顶级文件夹的名称,但无法获得有关以下实现中文件夹大小的任何详细信息。 The following is my implementation to get the top level folder names.以下是我获取顶级文件夹名称的实现。

import boto
import boto.s3.connection

AWS_ACCESS_KEY_ID = "access_id"
AWS_SECRET_ACCESS_KEY = "secret_access_key"
Bucketname = 'Bucket-name' 

conn = boto.s3.connect_to_region('ap-south-1',
   aws_access_key_id=AWS_ACCESS_KEY_ID,
   aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
   is_secure=True, # uncomment if you are not using ssl
   calling_format = boto.s3.connection.OrdinaryCallingFormat(),
   )

bucket = conn.get_bucket('bucket')
folders = bucket.list("", "/")

for folder in folders:
    print(folder.name)

The type of folder here is boto.s3.prefix.Prefix and it doesn't display any details of size.这里的文件夹类型是 boto.s3.prefix.Prefix 并且它不显示任何大小的详细信息。 Is there any way to search a folder/object in an S3 bucket by it's name and then fetch the size of that object ?有没有办法通过名称搜索 S3 存储桶中的文件夹/对象,然后获取该对象的大小?

In order to get the size of an S3 folder, objects (accessible in the boto3.resource('s3').Bucket) provide the method filter(Prefix) that allows you to retrieve ONLY the files which respect the Prefix condition, and makes it quite optimised.为了获取 S3 文件夹的大小, 对象(可在 boto3.resource('s3').Bucket 中访问)提供了方法filter(Prefix) ,该方法允许您检索符合 Prefix 条件的文件,并使它相当优化。

import boto3

def get_size(bucket, path):
    s3 = boto3.resource('s3')
    my_bucket = s3.Bucket(bucket)
    total_size = 0

    for obj in my_bucket.objects.filter(Prefix=path):
        total_size = total_size + obj.size

    return total_size

So let's say you want to get the size of the folder s3://my-bucket/my/path/ then you would call the previous function like that:因此,假设您想获取文件夹s3://my-bucket/my/path/的大小,那么您可以像这样调用前一个函数:

get_size("my-bucket", "my/path/")

Then this of course is easily applicable to top level folders as well那么这当然也很容易适用于顶级文件夹

To find the size of the top-level "folders" in S3 (S3 does not really have a concept of folders, but kind of displays a folder structure in the UI), something like this will work:要在 S3 中查找顶级“文件夹”的大小(S3 并没有真正的文件夹概念,但在 UI 中显示文件夹结构),可以使用以下方法:

from boto3 import client
conn = client('s3')

top_level_folders = dict()

for key in conn.list_objects(Bucket='kitsune-buildtest-production')['Contents']:

    folder = key['Key'].split('/')[0]
    print("Key %s in folder %s. %d bytes" % (key['Key'], folder, key['Size']))

    if folder in top_level_folders:
        top_level_folders[folder] += key['Size']
    else:
        top_level_folders[folder] = key['Size']


for folder, size in top_level_folders.items():
    print("Folder: %s, size: %d" % (folder, size))

Not using boto3, just aws cli, but this quick one-liner serves the purpose.不使用 boto3,只使用 aws cli,但是这个快速的单行代码可以达到目的。 I usually put a tail -1 to get the summary folder size only.我通常放一个 tail -1 来仅获取摘要文件夹大小。 Can be a bit slow though, for folders having many objects.但是,对于具有许多对象的文件夹,可能会有点慢。

aws s3 ls --summarize --human-readable --recursive s3://bucket-name/folder-name | aws s3 ls --summarize --human-readable --recursive s3://bucket-name/folder-name | tail -1尾-1

To get more than 1000 objects from S3 by using list_objects_v2, try this要使用 list_objects_v2 从 S3 获取 1000 多个对象,请尝试此操作

from boto3 import client
conn = client('s3')

top_level_folders = dict()

paginator = conn.get_paginator('list_objects_v2')
pages = paginator.paginate(Bucket='bucket', Prefix='prefix')
index = 1
for page in pages:
    for key in page['Contents']:
        print(key['Size'])
        folder = key['Key'].split('/')[index]
        print("Key %s in folder %s. %d bytes" % (key['Key'], folder, key['Size']))

        if folder in top_level_folders:
            top_level_folders[folder] += key['Size']
        else:
            top_level_folders[folder] = key['Size']

for folder, size in top_level_folders.items():
    size_in_gb = size/(1024*1024*1024)
    print("Folder: %s, size: %.2f GB" % (folder, size_in_gb))

if the prefix is notes/ and the delimiter is a slash (/) as in notes/summer/july, the common prefix is notes/summer/.如果前缀是 notes/ 并且分隔符是斜线 (/),如在 notes/summer/july 中,则公共前缀是 notes/summer/。 Incase prefix is "notes/" : index = 1 or "notes/summer/" : index = 2 Incase 前缀是 "notes/" : index = 1 或 "notes/summer/" : index = 2

def find_size(name, conn):
  for bucket in conn.get_all_buckets():
    if name == bucket.name:
      total_bytes = 0
      for key in bucket:
        total_bytes += key.size
        total_bytes = total_bytes/1024/1024/1024
      print total_bytes 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM