简体   繁体   English

尝试将 Azure Databricks 与 Azure 事件中心连接,以将一些属性发送到事件中心并通过 splunk 读取

[英]Trying to connect Azure Databricks with Azure Event hubs to send some properties to event hub and read it through splunk

I am looking to connect Azure data bricks to Event hub and read it through splunk.我希望将 Azure 数据块连接到事件中心并通过 splunk 读取。 Initially I was able to send a test message and was able to receive the events in splunk(It was possible using scala as per https://docs.microsoft.com/en-us/azure/databricks/scenarios/databricks-stream-from-eventhubs --> Send tweets to event hubs).最初,我能够发送测试消息并能够在 splunk 中接收事件(可以根据 https 使用 scala ://docs.microsoft.com/en-us/azure/databricks/scenarios/databricks-stream- from-eventhubs --> 向事件中心发送推文)。 Now I am trying to implement the same using python using the reference from https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-python-get-started-send ---> Send Event.现在我正在尝试使用 python 使用https: //docs.microsoft.com/en-us/azure/event-hubs/event-hubs-python-get-started-send 的参考来实现相同的功能--->发送事件。 When I try to pass the object-parameters, it throws an error like unable to import from Azure Event hubs当我尝试传递对象参数时,它会引发错误,例如无法从 Azure 事件中心导入

Can anyone help me understand how can I connect Azure databricks with Azure Eventhubs and include sending object -parameters?谁能帮我理解如何将 Azure 数据块与 Azure Eventthubs 连接并包括发送 object 参数?

PS: I have added the necessary libraries required to the cluster as below: PS:我已经添加了集群所需的必要库,如下所示:

  1. com.microsoft.azure:azure-eventhubs-spark_2.12:2.3.18 com.microsoft.azure:azure-eventhubs-spark_2.12:2.3.18
  2. azure-eventhubs-spark_2.11天蓝色-eventhubs-spark_2.11
  3. com.microsoft.azure:azure-eventhubs-spark_2.11:2.3.22 com.microsoft.azure:azure-eventhubs-spark_2.11:2.3.22
  4. org.apache.spark:spark-avro_2.12:3.1.1 org.apache.spark:spark-avro_2.12:3.1.1

I have checked it with different versions of libraries too.我也用不同版本的库检查过它。

Can someone help me with the syntax part on how to pass the object parameters in the format of key value pairs?有人可以帮助我了解如何以键值对的格式传递 object 参数的语法部分吗?

It's simple to connect to EventHubs from Azure Databricks - just follow official documentation, specifically the section Writing Data to Event Hubs (example is for batch write):从 Azure Databricks 连接到 EventHubs 很简单 - 只需遵循官方文档,特别是将数据写入事件中心部分(示例用于批量写入):

writeConnectionString = "SharedAccessSignatureFromAzurePortal"
ehWriteConf = {
  'eventhubs.connectionString' : 
     sc._jvm.org.apache.spark.eventhubs.EventHubsUtils.encrypt(writeConnectionString)
}

# Write body data from a DataFrame to EventHubs. 
# Events are distributed across partitions using round-robin model.
ds = df \
  .select("body") \
  .write \
  .format("eventhubs") \
  .options(**ehWriteConf) \
  .save()

You need to construct the body column somehow - by encoding your data as JSON using the to_json(struct("*")) , or encoding data as Avro...您需要以某种方式构建body列 - 通过使用to_json(struct("*"))将数据编码为 JSON 或将数据编码为 Avro ...

But you also have a problem in your cluster configuration - specifically these two libraries: azure-eventhubs-spark_2.11 and com.microsoft.azure:azure-eventhubs-spark_2.11:2.3.22 - they are for Spark 2, but you use Spark 3. Uninstall them, and restart the cluster.但是您的群集配置也有问题 - 特别是这两个库: azure-eventhubs-spark_2.11com.microsoft.azure:azure-eventhubs-spark_2.11:2.3.22 - 但它们适用于 Spark使用 Spark 3. 卸载它们,然后重新启动集群。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM