简体   繁体   English

使用 Java 获取 Azure Data Lake Gen2 中的文件夹大小

[英]Obtain Folder size in Azure Data Lake Gen2 using Java

There is some literature over the internet for C# to compute folder size.互联网上有一些关于 C# 计算文件夹大小的文献。 But could not find Java.但找不到 Java。

  1. Is there an easy way to know the folder size?有没有简单的方法可以知道文件夹的大小? in Gen2在第 2 代
  2. How to compute if not?如果没有怎么计算?

There are several examples on the internet for (2) with C# and powershell.互联网上有几个示例(2)与 C# 和 powershell。 Any means with Java? Java 有什么办法吗?

As far as I am aware, there is no API that directly provides the folder size in Azure Data Lake Gen2.据我所知,没有 API 直接提供 Azure Data Lake Gen2 中的文件夹大小。

To do it recursively:递归执行:

DataLakeServiceClient dataLakeServiceClient = new DataLakeServiceClientBuilder()
        .credential(new StorageSharedKeyCredential(storageAccountName, secret))
        .endpoint(endpoint)
        .buildClient();
DataLakeFileSystemClient container = dataLakeServiceClient.getFileSystemClient(containerName);


/**
 * Returns the size in bytes
 *
 * @param folder
 * @return
 */
@Beta
public Long getSize(String folder) {
    DataLakeDirectoryClient directoryClient = container.getDirectoryClient(folder);
    if (directoryClient.exists()) {
        AtomicInteger count = new AtomicInteger();
        return directoryClient.listPaths(true, false, null, null)
                .stream()
                .filter(x -> !x.isDirectory())
                .mapToLong(PathItem::getContentLength)
                .sum();
    }
    throw new RuntimeException("Not a valid folder: " + folder);
}

This recursively iterates through the folders and obtains the size.这递归地遍历文件夹并获得大小。

The default records per page is 5000. From the docs:每页的默认记录是 5000。来自文档:

recursive – Specifies if the call should recursively include all paths. recursive – 指定调用是否应递归地包含所有路径。

userPrincipleNameReturned – If "true", the user identity values returned in the x-ms-owner, x-ms-group, and x-ms-acl response headers will be transformed from Azure Active Directory Object IDs to User Principal Names. userPrincipleNameReturned – 如果为“true”,x-ms-owner、x-ms-group 和 x-ms-acl 响应标头中返回的用户身份值将从 Azure Active Directory Object ID 转换为用户主体名称。 If "false", the values will be returned as Azure Active Directory Object IDs.如果为“false”,则这些值将作为 Azure Active Directory Object ID 返回。 The default value is false.默认值为假。 Note that group and application Object IDs are not translated because they do not have unique friendly names.请注意,组和应用程序 Object ID 不会被翻译,因为它们没有唯一的友好名称。

maxResults – Specifies the maximum number of blobs to return per page, including all BlobPrefix elements. maxResults – 指定每页返回的最大 Blob 数,包括所有 BlobPrefix 元素。 If the request does not specify maxResults or specifies a value greater than 5,000, the server will return up to 5,000 items per page.如果请求未指定 maxResults 或指定大于 5,000 的值,则服务器每页最多返回 5,000 个项目。 If iterating by page, the page size passed to byPage methods such as PagedIterable.iterableByPage(int) will be preferred over this value.如果按页面迭代,则传递给 byPage 方法(如 PagedIterable.iterableByPage(int))的页面大小将优先于该值。

timeout – An optional timeout value beyond which a RuntimeException will be raised. timeout – 一个可选的超时值,超过该值将引发 RuntimeException。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 Java 函数存储数据湖 gen2? - How to store data lake gen2 with Java Functions? 寻找 REST API 以列出 Azure Data Lake Gen2 存储的所有容器 - Looking for REST API to list all Containers of Azure Data Lake Gen2 Storage 用于解析 Azure Data Lake Storage Gen2 URI 的正则表达式,用于使用 Azurite 进行生产和测试 - Regex to parse Azure Data Lake Storage Gen2 URI for production and testing with Azurite 使用Python或Java从本地将数据上传到Azure ADLS Gen2 - Upload data to the Azure ADLS Gen2 from on-premise using Python or Java 如何使用 java sdk 在 azure 数据湖 gen1 中创建资源? - How to create resources in azure data lake gen1 with java sdk? 如何使用租户 ID、客户端 ID 和客户端机密连接和管理 Azure Data Lake Storage Gen2 中的目录和文件? - How can I use tenant id, client id and client secret to connect to and manage directories and files in Azure Data Lake Storage Gen2? SQL Polybase 可以从 Azure datalake gen2 读取数据吗? - Can SQL Polybase read data from Azure datalake gen2? 从本地 Spark 作业连接到 Azure Data Lake Gen 2 - Connect to Azure Data Lake Gen 2 from local Spark job 无法在 Azure 数据湖中递归删除文件夹 - Unable to delete a folder recursively in Azure Data Lake ADLS Gen2 中文件的 Java 文件对象 - Java File Object for a File in ADLS Gen2
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM