[英]Fetch the size of nested folder from Azure Data Lake Storage Gen1 from Databricks notebook
I want fetch folder size details from Databricks Notebook.我想从 Databricks Notebook 获取文件夹大小的详细信息。
We can do the same via putty by running hadoop fs -lh {root-folder-path}.我们可以通过 putty 通过运行 hadoop fs -lh {root-folder-path} 来做同样的事情。 This command will return human readable size of the all the folder inside root-folder.此命令将返回根文件夹内所有文件夹的可读大小。 PFB sample : PFB样品:
I tried running similar hadoop command from notebook as below but Hadoop is not installed in driver node I believe :我尝试从笔记本运行类似的 hadoop 命令,如下所示,但我相信 Hadoop 未安装在驱动程序节点中:
When I tried ls {root-folder-path}.当我尝试 ls {root-folder-path} 时。 I am getting folder size as 0. This is because, dbutils provide size value for files only.我得到的文件夹大小为 0。这是因为 dbutils 仅提供文件的大小值。 Folders are hardcoded to 0.PFB sample :文件夹被硬编码为 0.PFB 示例:
Kindly guide me the best way to fetch the details.请指导我获取详细信息的最佳方式。
In Azure Databricks, this is expected behavior.在 Azure Databricks 中,这是预期行为。
You can get more details using Azure Databricks CLI:您可以使用 Azure Databricks CLI 获取更多详细信息:
The following article " Computing total storage size of a folder in Azure Data Lake with Pyspark " explains how to recursively compute the storage size and the number of files and folders in ADLS Gen 1 into Databricks.以下文章“ 使用 Pyspark计算Azure Data Lake 中文件夹的总存储大小”解释了如何将 ADLS Gen 1 中的存储大小以及文件和文件夹的数量递归计算到 Databricks 中。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.