[英]Obtain Folder size in Azure Data Lake Gen2 using Java
There is some literature over the internet for C# to compute folder size.互联网上有一些关于 C# 计算文件夹大小的文献。 But could not find Java.但找不到 Java。
There are several examples on the internet for (2) with C# and powershell.互联网上有几个示例(2)与 C# 和 powershell。 Any means with Java? Java 有什么办法吗?
As far as I am aware, there is no API that directly provides the folder size in Azure Data Lake Gen2.据我所知,没有 API 直接提供 Azure Data Lake Gen2 中的文件夹大小。
To do it recursively:递归执行:
DataLakeServiceClient dataLakeServiceClient = new DataLakeServiceClientBuilder()
.credential(new StorageSharedKeyCredential(storageAccountName, secret))
.endpoint(endpoint)
.buildClient();
DataLakeFileSystemClient container = dataLakeServiceClient.getFileSystemClient(containerName);
/**
* Returns the size in bytes
*
* @param folder
* @return
*/
@Beta
public Long getSize(String folder) {
DataLakeDirectoryClient directoryClient = container.getDirectoryClient(folder);
if (directoryClient.exists()) {
AtomicInteger count = new AtomicInteger();
return directoryClient.listPaths(true, false, null, null)
.stream()
.filter(x -> !x.isDirectory())
.mapToLong(PathItem::getContentLength)
.sum();
}
throw new RuntimeException("Not a valid folder: " + folder);
}
This recursively iterates through the folders and obtains the size.这递归地遍历文件夹并获得大小。
The default records per page is 5000. From the docs:每页的默认记录是 5000。来自文档:
recursive – Specifies if the call should recursively include all paths. recursive – 指定调用是否应递归地包含所有路径。
userPrincipleNameReturned – If "true", the user identity values returned in the x-ms-owner, x-ms-group, and x-ms-acl response headers will be transformed from Azure Active Directory Object IDs to User Principal Names. userPrincipleNameReturned – 如果为“true”,x-ms-owner、x-ms-group 和 x-ms-acl 响应标头中返回的用户身份值将从 Azure Active Directory Object ID 转换为用户主体名称。 If "false", the values will be returned as Azure Active Directory Object IDs.如果为“false”,则这些值将作为 Azure Active Directory Object ID 返回。 The default value is false.默认值为假。 Note that group and application Object IDs are not translated because they do not have unique friendly names.请注意,组和应用程序 Object ID 不会被翻译,因为它们没有唯一的友好名称。
maxResults – Specifies the maximum number of blobs to return per page, including all BlobPrefix elements. maxResults – 指定每页返回的最大 Blob 数,包括所有 BlobPrefix 元素。 If the request does not specify maxResults or specifies a value greater than 5,000, the server will return up to 5,000 items per page.如果请求未指定 maxResults 或指定大于 5,000 的值,则服务器每页最多返回 5,000 个项目。 If iterating by page, the page size passed to byPage methods such as PagedIterable.iterableByPage(int) will be preferred over this value.如果按页面迭代,则传递给 byPage 方法(如 PagedIterable.iterableByPage(int))的页面大小将优先于该值。
timeout – An optional timeout value beyond which a RuntimeException will be raised. timeout – 一个可选的超时值,超过该值将引发 RuntimeException。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.