将数据帧从 azure 数据块写入/保存到 azure 文件共享

Question

How to write to azure file share from azure databricks spark jobs.如何从 azure 数据块写入 azure 文件共享引发作业。

I configured the Hadoop storage key and values.我配置了 Hadoop 存储键和值。

spark.sparkContext.hadoopConfiguration.set(
  "fs.azure.account.key.STORAGEKEY.file.core.windows.net",
  "SECRETVALUE"
)


val wasbFileShare =
    s"wasbs://testfileshare@STORAGEKEY.file.core.windows.net/testPath"

df.coalesce(1).write.mode("overwrite").csv(wasbBlob)

When tried to save the dataframe to azure file share I'm seeing the following the resource not found error although the URI is present.当尝试将数据帧保存到 azure 文件共享时，尽管存在 URI，但我看到以下资源未找到错误。

 Exception in thread "main" org.apache.hadoop.fs.azure.AzureException: com.microsoft.azure.storage.StorageException: The requested URI does not represent any resource on the server.

Answer 1

Unfortunately, Azure databricks do not support reading and writing to Azure File Share.遗憾的是，Azure 数据块不支持读取和写入 Azure 文件共享。

Azure Databricks supported data sources: https://docs.microsoft.com/en-us/azure/databricks/data/data-sources/ Azure Databricks 支持的数据源： https ://docs.microsoft.com/en-us/azure/databricks/data/data-sources/

I would suggest you to provide feedback on the same:我建议您提供相同的反馈：

https://feedback.azure.com/forums/909463-azure-databricks https://feedback.azure.com/forums/909463-azure-databricks

All of the feedback you share in these forums will be monitored and reviewed by the Microsoft engineering teams responsible for building Azure.你在这些论坛中分享的所有反馈都将由负责构建 Azure 的 Microsoft 工程团队监控和审查。

You may checkout SO thread which addressing similar issue: Databricks and Azure Files您可以查看解决类似问题的 SO 线程： Databricks and Azure Files

Below is the code snippet for writing CSV data directly to an Azure blob storage container in an Azure Databricks Notebook.下面是将 CSV 数据直接写入 Azure Databricks Notebook 中的 Azure blob 存储容器的代码片段。

# Configure blob storage account access key globally
spark.conf.set("fs.azure.account.key.chepra.blob.core.windows.net", "gv7nVIXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXdlOiA==")
output_container_path = "wasbs://sampledata@chepra.blob.core.windows.net"
output_blob_folder = "%s/wrangled_data_folder" % output_container_path

# write the dataframe as a single file to blob storage
(dataframe
 .coalesce(1)
 .write
 .mode("overwrite")
 .option("header", "true")
 .format("com.databricks.spark.csv")
 .save(output_blob_folder))

# Get the name of the wrangled-data CSV file that was just saved to Azure blob storage (it starts with 'part-')
files = dbutils.fs.ls(output_blob_folder)
output_file = [x for x in files if x.name.startswith("part-")]

# Move the wrangled-data CSV file from a sub-folder (wrangled_data_folder) to the root of the blob container
# While simultaneously changing the file name
dbutils.fs.mv(output_file[0].path, "%s/predict-transform-output.csv" % output_container_path)

Answer 2

Steps to connect to azure file share from databricks从数据块连接到 azure 文件共享的步骤

first install Microsoft Azure Storage File Share client library for Python using pip install in Databricks.首先使用 Databricks 中的 pip install 为 Python 安装 Microsoft Azure 存储文件共享客户端库。 https://pypi.org/project/azure-storage-file-share/ https://pypi.org/project/azure-storage-file-share/

after installing, create a storage account.安装后，创建一个存储帐户。 Then you can create a fileshare from databricks然后您可以从数据块创建文件共享

from azure.storage.fileshare import ShareClient

share = ShareClient.from_connection_string(conn_str="<connection_string consists of FileEndpoint=myFileEndpoint(https://storageaccountname.file.core.windows.net/);SharedAccessSignature=sasToken>", share_name="<file share name that you want to create>")

share.create_share()

This code is to upload a file into fileshare through databricks这段代码是通过databricks将文件上传到fileshare

from azure.storage.fileshare import ShareFileClient
 
file_client = ShareFileClient.from_connection_string(conn_str="<connection_string consists of FileEndpoint=myFileEndpoint(https://storageaccountname.file.core.windows.net/);SharedAccessSignature=sasToken>", share_name="<your_fileshare_name>", file_path="my_file")
 
with open("./SampleSource.txt", "rb") as source_file:
    file_client.upload_file(source_file)

Refer this link for further information https://pypi.org/project/azure-storage-file-share/有关更多信息，请参阅此链接https://pypi.org/project/azure-storage-file-share/

将数据帧从 azure 数据块写入/保存到 azure 文件共享

问题描述

2 个解决方案

解决方案1
2 已采纳 2020-09-24 10:30:24

解决方案2
1 2021-02-26 17:01:18

将数据帧从 azure 数据块写入/保存到 azure 文件共享

问题描述

2 个解决方案

解决方案1 2 已采纳 2020-09-24 10:30:24

解决方案2 1 2021-02-26 17:01:18

解决方案1
2 已采纳 2020-09-24 10:30:24

解决方案2
1 2021-02-26 17:01:18