简体   繁体   English

将数据帧从 azure 数据块写入/保存到 azure 文件共享

[英]write/save Dataframe to azure file share from azure databricks

How to write to azure file share from azure databricks spark jobs.如何从 azure 数据块写入 azure 文件共享引发作业。

I configured the Hadoop storage key and values.我配置了 Hadoop 存储键和值。

spark.sparkContext.hadoopConfiguration.set(
  "fs.azure.account.key.STORAGEKEY.file.core.windows.net",
  "SECRETVALUE"
)


val wasbFileShare =
    s"wasbs://testfileshare@STORAGEKEY.file.core.windows.net/testPath"

df.coalesce(1).write.mode("overwrite").csv(wasbBlob)

When tried to save the dataframe to azure file share I'm seeing the following the resource not found error although the URI is present.当尝试将数据帧保存到 azure 文件共享时,尽管存在 URI,但我看到以下资源未找到错误。

 Exception in thread "main" org.apache.hadoop.fs.azure.AzureException: com.microsoft.azure.storage.StorageException: The requested URI does not represent any resource on the server.

Unfortunately, Azure databricks do not support reading and writing to Azure File Share.遗憾的是,Azure 数据块不支持读取和写入 Azure 文件共享。

Azure Databricks supported data sources: https://docs.microsoft.com/en-us/azure/databricks/data/data-sources/ Azure Databricks 支持的数据源: https ://docs.microsoft.com/en-us/azure/databricks/data/data-sources/

I would suggest you to provide feedback on the same:我建议您提供相同的反馈:

https://feedback.azure.com/forums/909463-azure-databricks https://feedback.azure.com/forums/909463-azure-databricks

All of the feedback you share in these forums will be monitored and reviewed by the Microsoft engineering teams responsible for building Azure.你在这些论坛中分享的所有反馈都将由负责构建 Azure 的 Microsoft 工程团队监控和审查。

You may checkout SO thread which addressing similar issue: Databricks and Azure Files您可以查看解决类似问题的 SO 线程: Databricks and Azure Files

Below is the code snippet for writing CSV data directly to an Azure blob storage container in an Azure Databricks Notebook.下面是将 CSV 数据直接写入 Azure Databricks Notebook 中的 Azure blob 存储容器的代码片段。

# Configure blob storage account access key globally
spark.conf.set("fs.azure.account.key.chepra.blob.core.windows.net", "gv7nVIXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXdlOiA==")
output_container_path = "wasbs://sampledata@chepra.blob.core.windows.net"
output_blob_folder = "%s/wrangled_data_folder" % output_container_path

# write the dataframe as a single file to blob storage
(dataframe
 .coalesce(1)
 .write
 .mode("overwrite")
 .option("header", "true")
 .format("com.databricks.spark.csv")
 .save(output_blob_folder))

# Get the name of the wrangled-data CSV file that was just saved to Azure blob storage (it starts with 'part-')
files = dbutils.fs.ls(output_blob_folder)
output_file = [x for x in files if x.name.startswith("part-")]

# Move the wrangled-data CSV file from a sub-folder (wrangled_data_folder) to the root of the blob container
# While simultaneously changing the file name
dbutils.fs.mv(output_file[0].path, "%s/predict-transform-output.csv" % output_container_path)

在此处输入图片说明

Steps to connect to azure file share from databricks从数据块连接到 azure 文件共享的步骤

first install Microsoft Azure Storage File Share client library for Python using pip install in Databricks.首先使用 Databricks 中的 pip install 为 Python 安装 Microsoft Azure 存储文件共享客户端库。 https://pypi.org/project/azure-storage-file-share/ https://pypi.org/project/azure-storage-file-share/

after installing, create a storage account.安装后,创建一个存储帐户。 Then you can create a fileshare from databricks然后您可以从数据块创建文件共享

from azure.storage.fileshare import ShareClient

share = ShareClient.from_connection_string(conn_str="<connection_string consists of FileEndpoint=myFileEndpoint(https://storageaccountname.file.core.windows.net/);SharedAccessSignature=sasToken>", share_name="<file share name that you want to create>")

share.create_share()

This code is to upload a file into fileshare through databricks这段代码是通过databricks将文件上传到fileshare

from azure.storage.fileshare import ShareFileClient
 
file_client = ShareFileClient.from_connection_string(conn_str="<connection_string consists of FileEndpoint=myFileEndpoint(https://storageaccountname.file.core.windows.net/);SharedAccessSignature=sasToken>", share_name="<your_fileshare_name>", file_path="my_file")
 
with open("./SampleSource.txt", "rb") as source_file:
    file_client.upload_file(source_file)

Refer this link for further information https://pypi.org/project/azure-storage-file-share/有关更多信息,请参阅此链接https://pypi.org/project/azure-storage-file-share/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将 Pandas 或 Pyspark dataframe 从 Databricks 保存到 Azure Blob 存储 - Save Pandas or Pyspark dataframe from Databricks to Azure Blob Storage 无法再将文件从 Databricks 保存到 Azure Blob 存储 - Can't save file from Databricks to Azure Blob Storage Anymore 将数据帧从 Azure Databricks 笔记本写入 Azure DataLake Gen2 表 - Write DataFrame from Azure Databricks notebook to Azure DataLake Gen2 Tables 将文件从 Azure 文件加载到 Azure Databricks - Load file from Azure Files to Azure Databricks 无法从 Databricks 写入 Azure Cosmos DB - Unable to write to Azure Cosmos DB from Databricks 如何从 Azure 数据块写入 RabbitMQ? - How to write into RabbitMQ from Azure databricks? 如何从 Azure Databricks 将 JSON 写入 Azure 队列 - How to write a JSON to Azure queue from Azure Databricks 将 pandas on spark API dataframe 保存到 azure 数据块中的新表 - Save pandas on spark API dataframe to a new table in azure databricks 如何在Databricks中阅读Azure CosmosDb Collection并写入Spark DataFrame - How to read Azure CosmosDb Collection in Databricks and write to a Spark DataFrame 如何从 Bot Framework Composer 将文件保存到 Azure 文件共享 - How to save files to Azure File Share from Bot Framework Composer
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM