簡體   English   中英

如何將 dbfs 文件和文件夾上傳到數據塊中的 ADLS?

[英]How to upload dbfs files and folders to ADLS in databricks?

我打算停止使用 dbfs,而是開始使用 ADLS,我試圖將我的文件和文件夾移動到 ADLS,然后我將使用 ADLS 路徑訪問數據塊中的文件。

如何繼續執行此要求?

如果您安裝了容器,那么您應該能夠使用dbutils.fs.cp命令

使用此處的信息安裝容器 -> https://docs.databricks.com/data/data-sources/azure/azure-storage.html

spark.conf.set(
      "fs.azure.account.key.<storage-account>.dfs.core.windows.net",
      dbutils.secrets.get(scope="<scope>", key="<storage-account-access-key>"))

安裝后,您可以從一個位置復制到另一個位置。

dbutils.fs.cp(local_filename, 'abfss://<container>@<storage-account>.dfs.core.windows.net/remote_filename')

顯然,這取決於您擁有多少文件和多少數據。 如手冊中所述,您可以通過在cp末尾添加 True 作為變量來遞歸復制文件夾中的所有文件。

dbutils.fs provides utilities for working with FileSystems. Most methods in this package can take either a DBFS path (e.g., "/foo" or "dbfs:/foo"), or another FileSystem URI. For more info about a method, use dbutils.fs.help("methodName"). In notebooks, you can also use the %fs shorthand to access DBFS. The %fs shorthand maps straightforwardly onto dbutils calls. For example, "%fs head --maxBytes=10000 /file/path" translates into "dbutils.fs.head("/file/path", maxBytes = 10000)".

fsutils

cp(from: String, to: String, recurse: boolean = false): boolean -> Copies a file or directory, possibly across FileSystems
head(file: String, maxBytes: int = 65536): String -> Returns up to the first 'maxBytes' bytes of the given file as a String encoded in UTF-8
ls(dir: String): Seq -> Lists the contents of a directory
mkdirs(dir: String): boolean -> Creates the given directory if it does not exist, also creating any necessary parent directories
mv(from: String, to: String, recurse: boolean = false): boolean -> Moves a file or directory, possibly across FileSystems
put(file: String, contents: String, overwrite: boolean = false): boolean -> Writes the given String out to a file, encoded in UTF-8
rm(dir: String, recurse: boolean = false): boolean -> Removes a file or directory

mount

mount(source: String, mountPoint: String, encryptionType: String = "", owner: String = null, extraConfigs: Map = Map.empty[String, String]): boolean -> Mounts the given source directory into DBFS at the given mount point
mounts: Seq -> Displays information about what is mounted within DBFS
refreshMounts: boolean -> Forces all machines in this cluster to refresh their mount cache, ensuring they receive the most recent information
unmount(mountPoint: String): boolean -> Deletes a DBFS mount point
updateMount(source: String, mountPoint: String, encryptionType: String = "", owner: String = null, extraConfigs: Map = Map.empty[String, String]): boolean -> Similar to mount(), but updates an existing mount point instead of creating a new one

來自https://docs.microsoft.com/en-us/azure/databricks/dev-tools/databricks-utils

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM