简体   繁体   中英

How to upload dbfs files and folders to ADLS in databricks?

I am planning to stop using dbfs but instead start using ADLS, I am trying to move my files and folders to ADLS and then I will use the ADLS path to access the files in databricks.

How to go ahead with this requirememt?

If you have the container mounted, then you should just be able to use the dbutils.fs.cp command

Mount the container using information here -> https://docs.databricks.com/data/data-sources/azure/azure-storage.html

spark.conf.set(
      "fs.azure.account.key.<storage-account>.dfs.core.windows.net",
      dbutils.secrets.get(scope="<scope>", key="<storage-account-access-key>"))

once mounted you can copy from one location to another.

dbutils.fs.cp(local_filename, 'abfss://<container>@<storage-account>.dfs.core.windows.net/remote_filename')

Obviously, this depends on how many files and how much data you have. You can recursivly copy all files in a folder by adding True as a variable at the end of the cp as outlined in the manual.

dbutils.fs provides utilities for working with FileSystems. Most methods in this package can take either a DBFS path (e.g., "/foo" or "dbfs:/foo"), or another FileSystem URI. For more info about a method, use dbutils.fs.help("methodName"). In notebooks, you can also use the %fs shorthand to access DBFS. The %fs shorthand maps straightforwardly onto dbutils calls. For example, "%fs head --maxBytes=10000 /file/path" translates into "dbutils.fs.head("/file/path", maxBytes = 10000)".

fsutils

cp(from: String, to: String, recurse: boolean = false): boolean -> Copies a file or directory, possibly across FileSystems
head(file: String, maxBytes: int = 65536): String -> Returns up to the first 'maxBytes' bytes of the given file as a String encoded in UTF-8
ls(dir: String): Seq -> Lists the contents of a directory
mkdirs(dir: String): boolean -> Creates the given directory if it does not exist, also creating any necessary parent directories
mv(from: String, to: String, recurse: boolean = false): boolean -> Moves a file or directory, possibly across FileSystems
put(file: String, contents: String, overwrite: boolean = false): boolean -> Writes the given String out to a file, encoded in UTF-8
rm(dir: String, recurse: boolean = false): boolean -> Removes a file or directory

mount

mount(source: String, mountPoint: String, encryptionType: String = "", owner: String = null, extraConfigs: Map = Map.empty[String, String]): boolean -> Mounts the given source directory into DBFS at the given mount point
mounts: Seq -> Displays information about what is mounted within DBFS
refreshMounts: boolean -> Forces all machines in this cluster to refresh their mount cache, ensuring they receive the most recent information
unmount(mountPoint: String): boolean -> Deletes a DBFS mount point
updateMount(source: String, mountPoint: String, encryptionType: String = "", owner: String = null, extraConfigs: Map = Map.empty[String, String]): boolean -> Similar to mount(), but updates an existing mount point instead of creating a new one

from https://docs.microsoft.com/en-us/azure/databricks/dev-tools/databricks-utils

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM