简体   繁体   English

计算 java 中的 S3 对象(文件夹)大小

[英]Calculate S3 object(folder) size in java

Im storing all types of files on amazon S3.我在亚马逊 S3 上存储所有类型的文件。 In amazon S3 bucket, All files store in different folders, I know there is no concept of folder in amazon s3.在 amazon S3 存储桶中,所有文件都存储在不同的文件夹中,我知道 amazon s3 中没有文件夹的概念。 Objects are only identified by their keys.对象仅由它们的键标识。 if i store any file with key like 'mydocs/personal/profile-pic.jpg' that mean two parents folders(personal folder inside mydocs folder) will be created there.如果我使用“mydocs/personal/profile-pic.jpg”之类的密钥存储任何文件,这意味着将在那里创建两个父文件夹(mydocs 文件夹中的个人文件夹)。

I want to calculate the size of any folder like 'mydocs' in java.我想计算 java 中任何文件夹的大小,例如“mydocs”。 I calculated bucket total size by using this code given below:我使用下面给出的代码计算了存储桶的总大小:

public long calculateBucketSize(String bucketName) {
long totalSize = 0;
    int totalItems = 0;
    ObjectListing objects = listObjects(bucketName);
    do {
        for (S3ObjectSummary objectSummary : objects.getObjectSummaries()) {
            totalSize += objectSummary.getSize();
            totalItems++;
        }
        objects = listNextBatchOfObjects(objects);
    } while (objects.isTruncated());
    System.out.println("Amazon S3 bucket: " + bucketName + " containing "
            + totalItems + " objects with a total size of " + totalSize
            + " bytes.");

    return totalSize;
}

This method will return the bucket total size.此方法将返回存储桶总大小。 I want to calculate the size of any single folder.我想计算任何单个文件夹的大小。 Any help will be appreciated.任何帮助将不胜感激。

There is an easy way to this with org.apache.hadoop lib使用 org.apache.hadoop lib 有一个简单的方法

  def calculateSize(path: String)(implicit spark: SparkSession): Long = {
    val fsPath = new Path(path)
    val fs = fsPath.getFileSystem(spark.sparkContext.hadoopConfiguration)
    fs.getContentSummary(fsPath).getLength
  }

This function can calculate size in s3, hdfs and local file system这个函数可以计算s3、hdfs和本地文件系统的大小

For Scala developers, here it is recursive function to execute a full scan and map of the contents of an AmazonS3 bucket using the official AWS SDK for Java对于 Scala 开发人员,这里是使用官方AWS SDK for Java执行 AmazonS3 存储桶内容的完整扫描和映射的递归函数

import com.amazonaws.services.s3.AmazonS3Client
import com.amazonaws.services.s3.model.{S3ObjectSummary, ObjectListing, GetObjectRequest}
import scala.collection.JavaConversions.{collectionAsScalaIterable => asScala}

def map[T](s3: AmazonS3Client, bucket: String, prefix: String)(f: (S3ObjectSummary) => T) = {

  def scan(acc:List[T], listing:ObjectListing): List[T] = {
    val summaries = asScala[S3ObjectSummary](listing.getObjectSummaries())
    val mapped = (for (summary <- summaries) yield f(summary)).toList

    if (!listing.isTruncated) mapped.toList
    else scan(acc ::: mapped, s3.listNextBatchOfObjects(listing))
  }

  scan(List(), s3.listObjects(bucket, prefix))
}

To invoke the above curried map() function, simply pass the already constructed (and properly initialized) AmazonS3Client object (refer to the officialAWS SDK for Java API Reference ), the bucket name and the prefix name in the first parameter list.要调用上述柯里化的map()函数,只需在第一个参数列表中传递已经构建(并正确初始化)的 AmazonS3Client 对象(请参阅官方AWS SDK for Java API 参考)、存储桶名称和前缀名称。 Also pass the function f() you want to apply to map each object summary in the second parameter list.还传递要应用的函数f()以映射第二个参数列表中的每个对象摘要。

For example例如

val tuple = map(s3, bucket, prefix)(s => (s.getKey, s.getOwner, s.getSize))

will return the full list of (key, owner, size) tuples in that bucket/prefix将返回该桶/前缀中(key, owner, size)元组的完整列表

or或者

map(s3, "bucket", "prefix")(s => s.getSize).sum

will return the total size of that bucket/prefix contents将返回该桶/前缀内容的总大小

You can combine map() with many other functions as you would normally approach byMonads in Functional Programming您可以将map()与许多其他函数结合起来,就像在函数式编程中 Monads通常会使用的那样

I think you want to get size of folder at each level.我认为您想获得每个级别的文件夹大小。 Like if you have one root folder R-Folder and two sub folder S1.1-Folder, S1.2-Folder and then S1.1-Folder has again three subfolder S1.1.1-Folder, S1.1.2-Folder, S1.1.3-Folder.就像如果您有一个根文件夹 R-Folder 和两个子文件夹 S1.1-Folder、S1.2-Folder 然后 S1.1-Folder 又具有三个子文件夹 S1.1.1-Folder、S1.1.2-Folder、S1。 1.3-文件夹。 Now you want the folder size of each folder现在你想要每个文件夹的文件夹大小

R-Folder (32MB)
|__S1.1-Folder (22MB)
|  |__S1.1.1-Folder (7MB)
|  |__S1.1.2-Folder (5MB)
|  |__S1.1.3-Folder (10MB)
|
|__S1.2-FOlder (10MB)

Am I correct ?我对么 ?

You have to keep list folder details with status isCompleted or not - and scan each folder recursively.您必须保留状态为是否已完成的列表文件夹详细信息 - 并递归扫描每个文件夹。 and when internal folder completed successfully then you have to update the size at its corresponding parent and that parent will update the to there corresponding parent and this will continue each time till root.并且当内部文件夹成功完成时,您必须更新其相应父级的大小,该父级将更新到相应的父级,并且每次都会持续到 root。

Stucked in the same problem, the simple solution is using :陷入同样的​​问题,简单的解决方案是使用:

 ObjectListing objects = listObjects(bucketName,prefix);


Where prefix is your folder name.其中 prefix 是您的文件夹名称。

For more information see this links:有关更多信息,请参阅此链接:

http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/model/ObjectListing.html http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/model/ObjectListing.html

http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/AmazonS3Client.html http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/AmazonS3Client.html

For the Java AWS SDK V2, here is an example:对于 Java AWS SDK V2,这是一个示例:

  public Long getFolderSize(String bucket, String prefix) {
    ListObjectsV2Request request =
        ListObjectsV2Request.builder().bucket(bucket).prefix(prefix).build();
    ListObjectsV2Iterable list = s3Client.listObjectsV2Paginator(request);
    long totalSize = 0;
    long numberItems = 0;
    for (S3Object object : list.contents()) {
      totalSize += object.size();
      numberItems++;
    }
    logger.info(
        "The size of the folder {}, is {} bytes, number of items {}",
        bucket + prefix,
        totalSize,
        numberItems);
    return totalSize;
  }

The below code gets all the files in a given prefix/key and returns the total size.下面的代码获取给定前缀/键中的所有文件并返回总大小。

public Long listS3FolderSize(String bucket, String dirPrefix) {
    Long folderSizeInBytes = 0L;
    List<S3ObjectSummary> objectsListing = getObjectSummaryList(bucket, dirPrefix);
    for (S3ObjectSummary summary: objectsListing) {
        folderSizeInBytes += summary.getSize();
    }

    return folderSizeInBytes;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM