使用 Java 获取 Azure Data Lake Gen2 中的文件夹大小

Question

互联网上有一些关于 C# 计算文件夹大小的文献。 但找不到 Java。

有没有简单的方法可以知道文件夹的大小？ 在第 2 代
如果没有怎么计算？

互联网上有几个示例（2）与 C# 和 powershell。 Java 有什么办法吗？

Answer 1

据我所知，没有 API 直接提供 Azure Data Lake Gen2 中的文件夹大小。

递归执行：

DataLakeServiceClient dataLakeServiceClient = new DataLakeServiceClientBuilder()
        .credential(new StorageSharedKeyCredential(storageAccountName, secret))
        .endpoint(endpoint)
        .buildClient();
DataLakeFileSystemClient container = dataLakeServiceClient.getFileSystemClient(containerName);


/**
 * Returns the size in bytes
 *
 * @param folder
 * @return
 */
@Beta
public Long getSize(String folder) {
    DataLakeDirectoryClient directoryClient = container.getDirectoryClient(folder);
    if (directoryClient.exists()) {
        AtomicInteger count = new AtomicInteger();
        return directoryClient.listPaths(true, false, null, null)
                .stream()
                .filter(x -> !x.isDirectory())
                .mapToLong(PathItem::getContentLength)
                .sum();
    }
    throw new RuntimeException("Not a valid folder: " + folder);
}

这递归地遍历文件夹并获得大小。

每页的默认记录是 5000。来自文档：

recursive – 指定调用是否应递归地包含所有路径。

userPrincipleNameReturned – 如果为“true”，x-ms-owner、x-ms-group 和 x-ms-acl 响应标头中返回的用户身份值将从 Azure Active Directory Object ID 转换为用户主体名称。 如果为“false”，则这些值将作为 Azure Active Directory Object ID 返回。 默认值为假。 请注意，组和应用程序 Object ID 不会被翻译，因为它们没有唯一的友好名称。

maxResults – 指定每页返回的最大 Blob 数，包括所有 BlobPrefix 元素。 如果请求未指定 maxResults 或指定大于 5,000 的值，则服务器每页最多返回 5,000 个项目。 如果按页面迭代，则传递给 byPage 方法（如 PagedIterable.iterableByPage(int)）的页面大小将优先于该值。

timeout – 一个可选的超时值，超过该值将引发 RuntimeException。

使用 Java 获取 Azure Data Lake Gen2 中的文件夹大小

问题描述

1 个解决方案

解决方案1
0 2022-02-05 15:26:29

使用 Java 获取 Azure Data Lake Gen2 中的文件夹大小

问题描述

1 个解决方案

解决方案1 0 2022-02-05 15:26:29

解决方案1
0 2022-02-05 15:26:29