简体   繁体   English

如何从 Azure Databricks 将 JSON 写入 Azure 队列

[英]How to write a JSON to Azure queue from Azure Databricks

I'm trying to read a JSON file from BLOB and write that file in Azure queue.我正在尝试从 BLOB 读取 JSON 文件并将该文件写入 Azure 队列中。 The reading part works fine but while writing it throws an error.读取部分工作正常,但在写入时会引发错误。

I've already tried the URL of the queue folder in which I'm trying to write, as parameter for.save()我已经尝试了我正在尝试写入的队列文件夹的 URL 作为参数 for.save()

Here's my code:这是我的代码:

storage_account_name="mrktmabcdestaaue"
storage_account_access_key="myurl=="
file_location="wasbs://myfolder@mrktmabcdestaaue.blob.core.windows.net/input.json"
file_type="json"
spark.conf.set(
        "fs.azure.account.key."+storage_account_name+".blob.core.windows.net",
  storage_account_access_key)
df = spark.read.option("multiline", "true").format(file_type).load(file_location)

df.write.mode("overwrite").format("com.databricks.spark.json").save("wasbs://myqueue@mrktmabcdestaaue.queue.core.windows.net")

My Input Json:我的输入 Json:

{
"Name": "Abc",
"Age": 18,
"City": "def"
}

The error message I'm getting is:我收到的错误消息是:

"shaded.databricks.org.apache.hadoop.fs.azure.AzureException: shaded.databricks.org.apache.hadoop.fs.azure.AzureException: Unable to access container myqueue in account mrktmabcdestaaue.queue.core.windows.net using anonymous credentials, and no credentials found for them in the configuration." "shaded.databricks.org.apache.hadoop.fs.azure.AzureException: shaded.databricks.org.apache.hadoop.fs.azure.AzureException: Unable to access container myqueue in account mrktmabcdestaaue.queue.core.windows.net using匿名凭据,并且在配置中找不到他们的凭据。”

It sounds like your cluster is not attached to that storage account.听起来您的集群未附加到该存储帐户。 Recreate your cluster and make sure that the account is attached to the cluster.重新创建您的集群并确保该帐户已附加到集群。

Your storage location should be wasbs://myfolder@mrktmabcdestaaue.blob.core.windows.net/input.json .您的存储位置应该是wasbs://myfolder@mrktmabcdestaaue.blob.core.windows.net/input.json

For more details, you could refer to this article .更多细节,你可以参考这篇文章

This scenario is not supported.不支持此方案。 You can write to Blob Storage Container, but not to Storage Queues.您可以写入 Blob 存储容器,但不能写入存储队列。

The Databricks Azure Queue (AQS) connector uses Azure Queue Storage (AQS) to provide an optimized file source that lets you find new files written to an Azure Blob Storage (ABS) container without repeatedly listing all of the files. Databricks Azure 队列 (AQS) 连接器使用 Azure 队列存储 (AQS) 来提供优化的文件源,让您可以查找写入 Z3A580F142203677F1F0BC30898 容器的新文件而无需重复列出所有 B353 文件存储 (BABSF)。 See the Documentation for more details.有关更多详细信息,请参阅文档 So can also only be used reading files.所以也只能用来读取文件。

If you want to pass the content you read to consumers you could use Azure EventHubs or Apache Kafka (on Azure HDInsight or Confluent) as a message broker.如果您想将阅读的内容传递给消费者,您可以使用 Azure EventHubs 或 Apache Kafka(在 Azure HDInsight 或 Confluent 上)。 In this scenario you would make use of structured streaming.在这种情况下,您将使用结构化流。 So you have to have a streaming data frame.所以你必须有一个流数据框。 Writing back stream would look like this:写回 stream 看起来像这样:

df \
.writeStream \
.format("eventhubs") \
.options(**ehConf) \
.option("checkpointLocation", checkploc) \
.start()

Ohter option could be to make use of Azure Event Grid.其他选项可能是使用 Azure 事件网格。 Finally it depends on the concrete scenario you would like to achieve.最后,这取决于您想要实现的具体场景。

I had the same scenario, tried the same thing, ended up here as a result.我有同样的场景,尝试了同样的事情,结果来到了这里。

I also tried dbutils.fs.mount but was not able to mount the storage queue as I could with blob storage.我也尝试dbutils.fs.mount但无法像使用 blob 存储那样挂载存储队列。

I ended up using a storage queue client:我最终使用了一个存储队列客户端:
https://pypi.org/project/azure-storage-queue/ https://pypi.org/project/azure-storage-queue/

Then I had to read each json message and call queue.send_message for each one.然后我必须阅读每条 json 消息并为每个消息调用 queue.send_message。 Not great but could not find a better solution.不是很好,但找不到更好的解决方案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从 Azure 数据块写入 RabbitMQ? - How to write into RabbitMQ from Azure databricks? 无法从 Databricks 写入 Azure Cosmos DB - Unable to write to Azure Cosmos DB from Databricks 将数据帧从 azure 数据块写入/保存到 azure 文件共享 - write/save Dataframe to azure file share from azure databricks 如何在 Databricks 上将 Azure Synapse Dataframe 转换为 JSON? - How to convert Azure Synapse Dataframe into JSON on Databricks? 如何从Azure Data Lake Store中读取Azure Databricks中的JSON文件 - How to read a JSON file in Azure Databricks from Azure Data Lake Store 如何压缩 azure 队列消息(json 字符串)以从 azure 存储队列中推送和拉取 - how to compress azure queue message (json string) to push and pull from azure storage queue 如何从 azure eventthub 到 databricks 获取记录 - How to get records from azure eventhub to databricks 无法从databricks pyspark worker写入Azure Sql DataWarehouse - Can't write to Azure Sql DataWarehouse from databricks pyspark workers Azure - 如何从 Azure Databricks 文件存储下载文件? - Azure - How to dowload a file from Azure Databricks Filestore? 如何从 Azure databricks 在 Azure Blob 中创建一个空文件夹 - How to create a empty folder in Azure Blob from Azure databricks
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM