![](/img/trans.png)
[英]Save Pandas or Pyspark dataframe from Databricks to SharePoint
[英]Save Pandas or Pyspark dataframe from Databricks to Azure Blob Storage
有没有一种方法可以将 Databricks 中的 Pyspark 或 Pandas dataframe 保存到 blob 存储中,而无需安装或安装库?
在将存储容器安装到 Databricks 并使用库com.crealytics.spark.excel后,我能够实现这一点,但我想知道我是否可以在没有库或没有安装的情况下做同样的事情,因为我将在没有这些的集群上工作2 权限。
这里是将 dataframe 本地保存到 dbfs 的代码。
# export
from os import path
folder = "export"
name = "export"
file_path_name_on_dbfs = path.join("/tmp", folder, name)
# Writing to DBFS
# .coalesce(1) used to generate only 1 file, if the dataframe is too big this won't work so you'll have multiple files qnd you need to copy them later one by one
sampleDF \
.coalesce(1) \
.write \
.mode("overwrite") \
.option("header", "true") \
.option("delimiter", ";") \
.option("encoding", "UTF-8") \
.csv(file_path_name_on_dbfs)
# path of destination, which will be sent to az storage
dest = file_path_name_on_dbfs + ".csv"
# Renaming part-000...csv to our file name
target_file = list(filter(lambda file: file.name.startswith("part-00000"), dbutils.fs.ls(file_path_name_on_dbfs)))
if len(target_file) > 0:
dbutils.fs.mv(target_file[0].path, dest)
dbutils.fs.cp(dest, f"file://{dest}") # this line is added for community edition only cause /dbfs is not recognized, so we copy the file locally
dbutils.fs.rm(file_path_name_on_dbfs,True)
将文件发送到 az 存储的代码:
import requests
sas="YOUR_SAS_TOKEN_PREVIOUSLY_CREATED" # follow the link below to create SAS token (using sas is slightly more secure than raw key storage)
blob_account_name = "YOUR_BLOB_ACCOUNT_NAME"
container = "YOUR_CONTAINER_NAME"
destination_path_w_name = "export/export.csv"
url = f"https://{blob_account_name}.blob.core.windows.net/{container}/{destination_path_w_name}?{sas}"
# here we read the content of our previously exported df -> csv
# if you are not on community edition you might want to use /dbfs + dest
payload=open(dest).read()
headers = {
'x-ms-blob-type': 'BlockBlob',
'Content-Type': 'text/csv' # you can change the content type according to your needs
}
response = requests.request("PUT", url, headers=headers, data=payload)
# if response.status_code is 201 it means your file was created successfully
print(response.status_code)
请记住,任何获得 sas 令牌的人都可以访问您的存储,具体取决于您在创建 sas 令牌时设置的权限
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.