简体   繁体   English

(使用 aws-sdk)如何查找存储在 amazon s3 中的文件夹的总大小

[英](using aws-sdk) How to find total size of a folder stored in amazon s3

I wanted to know the total size of a folder stored in S3 using AWS-SDK .我想知道使用AWS-SDK存储在 S3 中的文件夹的总大小。

Note:-笔记:-

I don't want to use any command or AWS console to find the size of my folder I wanted to do this by aws-sdk and I mentioned it above so please don't mark this as duplicate.我不想使用任何命令或 AWS 控制台来查找我想通过 aws-sdk 执行此操作的文件夹的大小,并且我在上面提到了它,因此请不要将其标记为重复。

so far what I found on the internet is to list down all the objects of folder and iterate throw it and i do this and it's working fine.到目前为止,我在互联网上发现的是列出文件夹的所有对象并迭代抛出它,我这样做并且它工作正常。 here is my code :-这是我的代码:-

import AWS from 'aws-sdk';

AWS.config.region = "BUCKET_REGION";
AWS.config.credentials = new AWS.CognitoIdentityCredentials({
   IdentityPoolId: "COGNITO_ID",
});
let bucketName = "BUCKET_NAME"
let bucket = new AWS.S3({
                params: {
                    Bucket: bucketName
                }
             });

 bucket.listObjects({Prefix:"FOLDER_NAME",Bucket:"BUCKET_NAME"}, function (err, data) {
        if (err) {
           console.log(err)                        
        } else {
           console.log(data)
           //data returns the array throw which I iterate and find the total size of the object
        }
  });

but what is the problem is that there is a point of time when my folder contains so many objects that it makes it hard to iterate each one of the elements in the list.但问题是有一个时间点,当我的文件夹包含如此多的对象时,很难迭代列表中的每个元素。 it takes to much time to just calculate the size of the folder.仅计算文件夹的大小需要很多时间。

so I need a better way to calculate the size of folder and all I found is this command所以我需要一个更好的方法来计算文件夹的大小,我发现的只是这个命令

aws s3 ls s3://myBucket/level1/level2/ --recursive --summarize | aws s3 ls s3://myBucket/level1/level2/ --recursive --summarize | awk 'BEGIN{ FS= " "} /Total Size/ {print $3}' awk 'BEGIN{ FS=" "} /Total Size/ {print $3}'

is there any way I can do the above process throw my aws-sdk .有什么办法可以做上述过程抛出我的aws-sdk

any kind of help is appreciated.任何形式的帮助表示赞赏。 thanks in advance提前致谢

This lambda method is pretty fast and it can work well for buckets with up to 100,000 objects if you are not concerned about a couple seconds delay.这种lambda 方法非常快,如果您不担心几秒钟的延迟,它可以很好地处理多达 100,000 个对象的存储桶。 The AWS CLI has around the same performance because it seems to be using the same API, and S3 Metrics or Cloudwatch Stats might be more complicated to configure especially if you want to look only at specific folders. AWS CLI的性能大致相同,因为它似乎使用相同的 API,而S3 Metrics 或 Cloudwatch Stats 的配置可能更复杂,尤其是当您只想查看特定文件夹时。

Storing this in info in a database and triggering the method within intervals using flags is the way to go for small size buckets or folders.将其存储在数据库中的信息中并使用标志在间隔内触发该方法是小尺寸存储桶或文件夹的方法。

const AWS = require('aws-sdk'), s3 = new AWS.S3()

exports.handler = async function (event) {
    var totalSize = 0, ContinuationToken
  do {
    var resp = await s3.listObjectsV2({
      Bucket: bucketName,
      Prefix: `folder/subfolder/`,
      ContinuationToken
    }).promise().catch(e=>console.log(e))
    resp.Contents.forEach(o=>totalSize+=o.Size)
    ContinuationToken = resp.NextContinuationToken
  } while (ContinuationToken)

  console.log(totalSize) //your answer
}

It appears that your situation is:看来你的情况是:

  • You want to know the size of an Amazon S3 bucket on a regular basis您想定期了解 Amazon S3 存储桶的大小
  • The bucket contains a large number of objects, which takes too much time存储桶包含大量对象,花费太多时间

Rather than listing objects and calculating sizes, I would recommend two alternatives:我不会列出对象并计算大小,而是推荐两种选择:

Amazon S3 Inventory亚马逊 S3 清单

Amazon S3 Inventory can provide a daily CSV file with details of all objects in a bucket. Amazon S3 Inventory可以提供每日 CSV 文件,其中包含存储桶中所有对象的详细信息。 You could then take this data and calculate the total.然后,您可以获取这些数据并计算总数。

Amazon CloudWatch bucket metrics Amazon CloudWatch 存储桶指标

Amazon CloudWatch has several metrics related to Amazon S3 buckets: Amazon CloudWatch 有几个与 Amazon S3 存储桶相关的指标:

  • BucketSizeBytes
  • NumberOfObjects

I'm not sure how often those metrics are updated (they are not instant), but BucketSizeBytes seems like it would be ideal for you.我不确定这些指标多久更新一次(它们不是即时的),但BucketSizeBytes似乎对你来说是理想的。

If all else fails...如果一切都失败了......

If the above two options do not meet your needs (eg you need to know the metrics "right now"), the remaining option would be to maintain your own database of objects .如果上述两个选项不能满足您的需求(例如,您需要“立即”了解指标),剩下的选项将是维护您自己的对象数据库 The database would need to be updated whenever an object is added or removed from the bucket (which can be done by using Amazon S3 Events to trigger an AWS Lambda function).每当在存储桶中添加或删除对象时,都需要更新数据库(这可以通过使用 Amazon S3 事件触发 AWS Lambda 函数来完成)。 You could then consult your own database to have the information available rather quickly.然后,您可以查阅您自己的数据库以更快地获得信息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM