繁体   English   中英

Databricks:将文件从 Azure Blob 存储上传到另一个位置而不在本地复制

[英]Databricks: Uploading a file to another location from Azure Blob Storage without copying it locally

我在 Azure Blob 存储中有一个文件,我想将其上传到另一个位置而不将其复制到 Databricks 的本地存储。

目前我的代码需要在上传前复制到本地:

# Set up connection to Azure Blob Storage
spark.conf.set("fs.azure.account.key.[some location]", "[account key]")

# Copies the file to Databricks local storage
dbutils.fs.cp("wasbs://[folder location]/some_file.csv", "temp_some_file.csv")

# Setting up for upload data to other system
uploader = client.create_dataset_from_upload('data', 'csv') # This is an external library call

# Read the local copy file and upload it to another system
with open('/dbfs/temp_some_file.csv') as dataset:
    uploader.upload_file(dataset)

如何更改open()命令以直接指向 Azure Blob 存储中的文件?

您可以 将容器安装在 DBFS 中

storage = ...
container = ...
sas = '...'

dbutils.fs.mount(
  source = f"wasbs://{container}@{storage}.blob.core.windows.net",
  mount_point = "/mnt/uploader",
  extra_configs = {f"fs.azure.sas.{container}.{storage}.blob.core.windows.net": sas}
)

因此可以在dbfs:/mnt/uploader访问它。 而且,由于 DBFS 本身安装在/dbfs的驱动程序/执行程序上,您将能够直接打开文件:

with open('/dbfs/mnt/uploader/some_file.csv', 'r') as dataset:
    uploader.upload_file(dataset)

不要忘记卸载(除非你想要永久卸载)。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM