[英]How to list sub-directories in a data lake file system on Azure Databricks using R
I am working in an R-notebook in databricks on Azure. 我正在使用Azure上的数据砖中的R笔记本。 Using AzureStor package, I can list the names of objects in the data lake or the file system therein, resp., the following way:
使用AzureStor程序包,我可以通过以下方式列出数据湖或其中的文件系统中的对象名称:
endPoint <- AzureStor::adls_endpoint(endpoint = "https://<myStorageName>.dfs.core.windows.net" ,key = <myStorageKey>)
storage_containers <- AzureStor::list_storage_containers(endPoint)
paste0("https://", myStorageName,".dfs.core.windows.net/", names(storage_containers)[1]) -> path2fs
myFileSys <- AzureStor::adls_filesystem(path2fs, key)
AzureStor::list_adls_files(myFileSys, "/")
That gives my an R data.frame that comprises information about the "name" of the content and also a column "isDirectory". 这给了我一个R data.frame,其中包含有关内容的“名称”的信息以及列“ isDirectory”。
If "isDirectory" is true, I would like to see the content of this directory. 如果“ isDirectory”为true,我想查看该目录的内容。 How does that work?
这是如何运作的? Trying to set a new endpoint as
尝试将新端点设置为
endPoint <- AzureStor::adls_endpoint(endpoint = "https://<myStorageName>.dfs.core.windows.net/<myDirectoryName>" ,key = <myStorageKey>)
fails. 失败。
So, how can I further let my code explore the directory and its content when the structure is like DataLake -> FileSystem -> Directory -> Directory&Files -> Directory&Files -> ... etc.? 那么,当结构类似于DataLake-> FileSystem-> Directory-> Directory&Files-> Directory&Files-> ...等时,如何进一步让我的代码浏览目录及其内容?
The answer to my question is just to set recursive = TRUE, so: 我的问题的答案只是设置递归= TRUE,因此:
list_adls_files(myFileSys, dir = "/", info = "all", recursive = TRUE)
Can be so easy sometimes! 有时候可以这么容易!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.