如何使用R列出Azure Databricks上数据湖文件系统中的子目录

Question

I am working in an R-notebook in databricks on Azure. 我正在使用Azure上的数据砖中的R笔记本。 Using AzureStor package, I can list the names of objects in the data lake or the file system therein, resp., the following way: 使用AzureStor程序包，我可以通过以下方式列出数据湖或其中的文件系统中的对象名称：

 endPoint <- AzureStor::adls_endpoint(endpoint = "https://<myStorageName>.dfs.core.windows.net" ,key = <myStorageKey>)
 storage_containers <- AzureStor::list_storage_containers(endPoint)

 paste0("https://", myStorageName,".dfs.core.windows.net/", names(storage_containers)[1]) -> path2fs
 myFileSys <- AzureStor::adls_filesystem(path2fs, key)
 AzureStor::list_adls_files(myFileSys, "/")

That gives my an R data.frame that comprises information about the "name" of the content and also a column "isDirectory". 这给了我一个R data.frame，其中包含有关内容的“名称”的信息以及列“ isDirectory”。

If "isDirectory" is true, I would like to see the content of this directory. 如果“ isDirectory”为true，我想查看该目录的内容。 How does that work? 这是如何运作的？ Trying to set a new endpoint as 尝试将新端点设置为

 endPoint <- AzureStor::adls_endpoint(endpoint = "https://<myStorageName>.dfs.core.windows.net/<myDirectoryName>" ,key = <myStorageKey>)

fails. 失败。

So, how can I further let my code explore the directory and its content when the structure is like DataLake -> FileSystem -> Directory -> Directory&Files -> Directory&Files -> ... etc.? 那么，当结构类似于DataLake-> FileSystem-> Directory-> Directory＆Files-> Directory＆Files-> ...等时，如何进一步让我的代码浏览目录及其内容？

Answer 1

The answer to my question is just to set recursive = TRUE, so: 我的问题的答案只是设置递归= TRUE，因此：

 list_adls_files(myFileSys, dir = "/", info = "all", recursive = TRUE)

Can be so easy sometimes! 有时候可以这么容易！

如何使用R列出Azure Databricks上数据湖文件系统中的子目录

问题描述

1 个解决方案

解决方案1
0 2019-09-06 06:12:20

如何使用R列出Azure Databricks上数据湖文件系统中的子目录

问题描述

1 个解决方案

解决方案1 0 2019-09-06 06:12:20

解决方案1
0 2019-09-06 06:12:20