繁体   English   中英

如何将 dbfs 文件和文件夹上传到数据块中的 ADLS?

[英]How to upload dbfs files and folders to ADLS in databricks?

我打算停止使用 dbfs,而是开始使用 ADLS,我试图将我的文件和文件夹移动到 ADLS,然后我将使用 ADLS 路径访问数据块中的文件。

如何继续执行此要求?

如果您安装了容器,那么您应该能够使用dbutils.fs.cp命令

使用此处的信息安装容器 -> https://docs.databricks.com/data/data-sources/azure/azure-storage.html

spark.conf.set(
      "fs.azure.account.key.<storage-account>.dfs.core.windows.net",
      dbutils.secrets.get(scope="<scope>", key="<storage-account-access-key>"))

安装后,您可以从一个位置复制到另一个位置。

dbutils.fs.cp(local_filename, 'abfss://<container>@<storage-account>.dfs.core.windows.net/remote_filename')

显然,这取决于您拥有多少文件和多少数据。 如手册中所述,您可以通过在cp末尾添加 True 作为变量来递归复制文件夹中的所有文件。

dbutils.fs provides utilities for working with FileSystems. Most methods in this package can take either a DBFS path (e.g., "/foo" or "dbfs:/foo"), or another FileSystem URI. For more info about a method, use dbutils.fs.help("methodName"). In notebooks, you can also use the %fs shorthand to access DBFS. The %fs shorthand maps straightforwardly onto dbutils calls. For example, "%fs head --maxBytes=10000 /file/path" translates into "dbutils.fs.head("/file/path", maxBytes = 10000)".

fsutils

cp(from: String, to: String, recurse: boolean = false): boolean -> Copies a file or directory, possibly across FileSystems
head(file: String, maxBytes: int = 65536): String -> Returns up to the first 'maxBytes' bytes of the given file as a String encoded in UTF-8
ls(dir: String): Seq -> Lists the contents of a directory
mkdirs(dir: String): boolean -> Creates the given directory if it does not exist, also creating any necessary parent directories
mv(from: String, to: String, recurse: boolean = false): boolean -> Moves a file or directory, possibly across FileSystems
put(file: String, contents: String, overwrite: boolean = false): boolean -> Writes the given String out to a file, encoded in UTF-8
rm(dir: String, recurse: boolean = false): boolean -> Removes a file or directory

mount

mount(source: String, mountPoint: String, encryptionType: String = "", owner: String = null, extraConfigs: Map = Map.empty[String, String]): boolean -> Mounts the given source directory into DBFS at the given mount point
mounts: Seq -> Displays information about what is mounted within DBFS
refreshMounts: boolean -> Forces all machines in this cluster to refresh their mount cache, ensuring they receive the most recent information
unmount(mountPoint: String): boolean -> Deletes a DBFS mount point
updateMount(source: String, mountPoint: String, encryptionType: String = "", owner: String = null, extraConfigs: Map = Map.empty[String, String]): boolean -> Similar to mount(), but updates an existing mount point instead of creating a new one

来自https://docs.microsoft.com/en-us/azure/databricks/dev-tools/databricks-utils

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM