简体   繁体   English

如何将 dbfs 文件和文件夹上传到数据块中的 ADLS?

[英]How to upload dbfs files and folders to ADLS in databricks?

I am planning to stop using dbfs but instead start using ADLS, I am trying to move my files and folders to ADLS and then I will use the ADLS path to access the files in databricks.我打算停止使用 dbfs,而是开始使用 ADLS,我试图将我的文件和文件夹移动到 ADLS,然后我将使用 ADLS 路径访问数据块中的文件。

How to go ahead with this requirememt?如何继续执行此要求?

If you have the container mounted, then you should just be able to use the dbutils.fs.cp command如果您安装了容器,那么您应该能够使用dbutils.fs.cp命令

Mount the container using information here -> https://docs.databricks.com/data/data-sources/azure/azure-storage.html使用此处的信息安装容器 -> https://docs.databricks.com/data/data-sources/azure/azure-storage.html

spark.conf.set(
      "fs.azure.account.key.<storage-account>.dfs.core.windows.net",
      dbutils.secrets.get(scope="<scope>", key="<storage-account-access-key>"))

once mounted you can copy from one location to another.安装后,您可以从一个位置复制到另一个位置。

dbutils.fs.cp(local_filename, 'abfss://<container>@<storage-account>.dfs.core.windows.net/remote_filename')

Obviously, this depends on how many files and how much data you have.显然,这取决于您拥有多少文件和多少数据。 You can recursivly copy all files in a folder by adding True as a variable at the end of the cp as outlined in the manual.如手册中所述,您可以通过在cp末尾添加 True 作为变量来递归复制文件夹中的所有文件。

dbutils.fs provides utilities for working with FileSystems. Most methods in this package can take either a DBFS path (e.g., "/foo" or "dbfs:/foo"), or another FileSystem URI. For more info about a method, use dbutils.fs.help("methodName"). In notebooks, you can also use the %fs shorthand to access DBFS. The %fs shorthand maps straightforwardly onto dbutils calls. For example, "%fs head --maxBytes=10000 /file/path" translates into "dbutils.fs.head("/file/path", maxBytes = 10000)".

fsutils

cp(from: String, to: String, recurse: boolean = false): boolean -> Copies a file or directory, possibly across FileSystems
head(file: String, maxBytes: int = 65536): String -> Returns up to the first 'maxBytes' bytes of the given file as a String encoded in UTF-8
ls(dir: String): Seq -> Lists the contents of a directory
mkdirs(dir: String): boolean -> Creates the given directory if it does not exist, also creating any necessary parent directories
mv(from: String, to: String, recurse: boolean = false): boolean -> Moves a file or directory, possibly across FileSystems
put(file: String, contents: String, overwrite: boolean = false): boolean -> Writes the given String out to a file, encoded in UTF-8
rm(dir: String, recurse: boolean = false): boolean -> Removes a file or directory

mount

mount(source: String, mountPoint: String, encryptionType: String = "", owner: String = null, extraConfigs: Map = Map.empty[String, String]): boolean -> Mounts the given source directory into DBFS at the given mount point
mounts: Seq -> Displays information about what is mounted within DBFS
refreshMounts: boolean -> Forces all machines in this cluster to refresh their mount cache, ensuring they receive the most recent information
unmount(mountPoint: String): boolean -> Deletes a DBFS mount point
updateMount(source: String, mountPoint: String, encryptionType: String = "", owner: String = null, extraConfigs: Map = Map.empty[String, String]): boolean -> Similar to mount(), but updates an existing mount point instead of creating a new one

from https://docs.microsoft.com/en-us/azure/databricks/dev-tools/databricks-utils来自https://docs.microsoft.com/en-us/azure/databricks/dev-tools/databricks-utils

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM