简体   繁体   English

如何使用R列出Azure Databricks上数据湖文件系统中的子目录

[英]How to list sub-directories in a data lake file system on Azure Databricks using R

I am working in an R-notebook in databricks on Azure. 我正在使用Azure上的数据砖中的R笔记本。 Using AzureStor package, I can list the names of objects in the data lake or the file system therein, resp., the following way: 使用AzureStor程序包,我可以通过以下方式列出数据湖或其中的文件系统中的对象名称:

 endPoint <- AzureStor::adls_endpoint(endpoint = "https://<myStorageName>.dfs.core.windows.net" ,key = <myStorageKey>)
 storage_containers <- AzureStor::list_storage_containers(endPoint)

 paste0("https://", myStorageName,".dfs.core.windows.net/", names(storage_containers)[1]) -> path2fs
 myFileSys <- AzureStor::adls_filesystem(path2fs, key)
 AzureStor::list_adls_files(myFileSys, "/")

That gives my an R data.frame that comprises information about the "name" of the content and also a column "isDirectory". 这给了我一个R data.frame,其中包含有关内容的“名称”的信息以及列“ isDirectory”。

If "isDirectory" is true, I would like to see the content of this directory. 如果“ isDirectory”为true,我想查看该目录的内容。 How does that work? 这是如何运作的? Trying to set a new endpoint as 尝试将新端点设置为

 endPoint <- AzureStor::adls_endpoint(endpoint = "https://<myStorageName>.dfs.core.windows.net/<myDirectoryName>" ,key = <myStorageKey>)

fails. 失败。

So, how can I further let my code explore the directory and its content when the structure is like DataLake -> FileSystem -> Directory -> Directory&Files -> Directory&Files -> ... etc.? 那么,当结构类似于DataLake-> FileSystem-> Directory-> Directory&Files-> Directory&Files-> ...等时,如何进一步让我的代码浏览目录及其内容?

The answer to my question is just to set recursive = TRUE, so: 我的问题的答案只是设置递归= TRUE,因此:

 list_adls_files(myFileSys, dir = "/", info = "all", recursive = TRUE)

Can be so easy sometimes! 有时候可以这么容易!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 databricks 在 Azure 数据湖中将.rdata 文件转换为镶木地板? - How to convert .rdata file to parquet in Azure data lake using databricks? 使用 lapply 在子目录中获取多个 R 脚本 - Using lapply to source multiple R scripts in sub-directories 遍历R中for循环中的子目录 - Iterate through sub-directories in for loop in R R:使用子目录和文档创建项目 - R: Create project with sub-directories and documentation 使用R在子目录中合并许多大型CSV文件 - Using R to merge many large CSV files across sub-directories 使用 R 访问 Azure 数据湖 - Accessing Azure Data Lake with R 如何从Azure数据湖(存储账户)中读取R中的一个数据文件 - How to read a data file in R from Azure data lake (storage account) R:计算不同子目录/文件夹中具有特定扩展名的文件数 - R: Count the number of files with a specific extension in different sub-directories/folders 将多个子目录中的特定文件复制到 R 中的单个文件夹中 - Copying specific files from multiple sub-directories into a single folder in R 在闪亮的 Databricks 连接中支持 Azure Data Lake Gen 2 凭据传递需要其他配置吗? - Additional config needed to supported Azure Data Lake Gen 2 credential passthrough in sparkly Databricks connection?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM