简体繁体 English

从 Databricks 笔记本中的 Azure Data Lake Storage Gen1 获取嵌套文件夹的大小

[英]Fetch the size of nested folder from Azure Data Lake Storage Gen1 from Databricks notebook

原文 2020-11-19 08:23:13 0 1 linux/ azure/ hadoop/ databricks/ azure-databricks

I want fetch folder size details from Databricks Notebook.我想从 Databricks Notebook 获取文件夹大小的详细信息。

We can do the same via putty by running hadoop fs -lh {root-folder-path}.我们可以通过 putty 通过运行 hadoop fs -lh {root-folder-path} 来做同样的事情。 This command will return human readable size of the all the folder inside root-folder.此命令将返回根文件夹内所有文件夹的可读大小。 PFB sample : PFB样品：

I tried running similar hadoop command from notebook as below but Hadoop is not installed in driver node I believe :我尝试从笔记本运行类似的 hadoop 命令，如下所示，但我相信 Hadoop 未安装在驱动程序节点中：

When I tried ls {root-folder-path}.当我尝试 ls {root-folder-path} 时。 I am getting folder size as 0. This is because, dbutils provide size value for files only.我得到的文件夹大小为 0。这是因为 dbutils 仅提供文件的大小值。 Folders are hardcoded to 0.PFB sample :文件夹被硬编码为 0.PFB 示例：

Kindly guide me the best way to fetch the details.请指导我获取详细信息的最佳方式。

1 个解决方案

In Azure Databricks, this is expected behavior.在 Azure Databricks 中，这是预期行为。

For Files, it displays the actual file size.对于文件，它显示实际文件大小。
For Directories, it displays the size=0对于目录，它显示 size=0
For Corrupted files displays the size=0对于损坏的文件显示大小=0

You can get more details using Azure Databricks CLI:您可以使用 Azure Databricks CLI 获取更多详细信息：

The following article " Computing total storage size of a folder in Azure Data Lake with Pyspark " explains how to recursively compute the storage size and the number of files and folders in ADLS Gen 1 into Databricks.以下文章“ 使用 Pyspark计算Azure Data Lake 中文件夹的总存储大小”解释了如何将 ADLS Gen 1 中的存储大小以及文件和文件夹的数量递归计算到 Databricks 中。

用于Linux机器上传/下载Azure存储数据的天蓝色命令行工具 - azure command line tool for linux machine to upload/download data to/from Azure Storage

如何使用位于 azure webapp 中的 .sh 文件将多个文件从本地存储上传到 azure blob 存储 - How to upload multiple files from local storage to the azure blob storage using .sh file that located in azure webapp

无法使用 Databricks Cli 从 Linux/Unbuntu 输入 Databricks 令牌 - Unable to enter Databricks token from Linux/Unbuntu using Databricks Cli

从该设备上文件的名称/描述符获取存储设备块大小 - Get storage device block size from name/descriptor of a file on that device

如何从 azure 云连接到 linux 服务器并上传到 BLOB 存储？ - how to connect to linux server from azure cloud and upload to BLOB storage?

如何使用blobxfer从azure blob存储下载子目录 - how to download a sub directory from azure blob storage using blobxfer

从Linux上的Azure Blob存储中上载多个文件 - Upload multiple files in Azure Blob Storage from Linux

使用 bash 脚本从 Azure Blob 存储读取 JSON 文件并写回 Blob 存储中的另一个文件 - Read JSON file from Azure Blob Storage using bash script and write back to another file in blob storage

从pyttsx python中的文本文件中获取数据 - fetch data from text file in pyttsx python

无法在 Databricks R Notebook 上使用某些包 - Unable to use some packages on Databricks R Notebook

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 用于Linux机器上传/下载Azure存储数据的天蓝色命令行工具 - azure command line tool for linux machine to upload/download data to/from Azure Storage 如何使用位于 azure webapp 中的 .sh 文件将多个文件从本地存储上传到 azure blob 存储 - How to upload multiple files from local storage to the azure blob storage using .sh file that located in azure webapp 无法使用 Databricks Cli 从 Linux/Unbuntu 输入 Databricks 令牌 - Unable to enter Databricks token from Linux/Unbuntu using Databricks Cli 从该设备上文件的名称/描述符获取存储设备块大小 - Get storage device block size from name/descriptor of a file on that device 如何从 azure 云连接到 linux 服务器并上传到 BLOB 存储？ - how to connect to linux server from azure cloud and upload to BLOB storage? 如何使用blobxfer从azure blob存储下载子目录 - how to download a sub directory from azure blob storage using blobxfer 从Linux上的Azure Blob存储中上载多个文件 - Upload multiple files in Azure Blob Storage from Linux 使用 bash 脚本从 Azure Blob 存储读取 JSON 文件并写回 Blob 存储中的另一个文件 - Read JSON file from Azure Blob Storage using bash script and write back to another file in blob storage 从pyttsx python中的文本文件中获取数据 - fetch data from text file in pyttsx python 无法在 Databricks R Notebook 上使用某些包 - Unable to use some packages on Databricks R Notebook

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM