简体   繁体   中英

Consume events from EventHub In Azure Databricks using pySpark

I could see spark connectors & guidelines for consuming events from Event Hub using Scala in Azure Databricks.

But, How can we consume events in event Hub from azure databricks using pySpark?

any suggestions/documentation details would help. thanks

Below is the snippet for reading events from event hub from pyspark on azure data-bricks.

// With an entity path 
val with = "Endpoint=sb://SAMPLE;SharedAccessKeyName=KEY_NAME;SharedAccessKey=KEY;EntityPath=EVENTHUB_NAME"


# Source with default settings
connectionString = "Valid EventHubs connection string."
ehConf = {
  'eventhubs.connectionString' : connectionString
}

df = spark \
  .readStream \
  .format("eventhubs") \
  .options(**ehConf) \
  .load()

readInStreamBody = df.withColumn("body", df["body"].cast("string"))
display(readInStreamBody)

I think there is slight modification that is required if you are using spark version 2.4.5 or greater and version of the Azure event Hub Connector 2.3.15 or above

For 2.3.15 version and above, the configuration dictionary requires that connection string be encrypted, So you need to pass it as shown in the code snippet below.

connectionString = "Endpoint=sb://SAMPLE;SharedAccessKeyName=KEY_NAME;SharedAccessKey=KEY;EntityPath=EVENTHUB_NAME"
ehConf = {}
ehConf['eventhubs.connectionString'] = sc._jvm.org.apache.spark.eventhubs.EventHubsUtils.encrypt(connectionString)

df = spark \
  .readStream \
  .format("eventhubs") \
  .options(**ehConf) \
  .load()

readInStreamBody = df.withColumn("body", df["body"].cast("string"))
display(readInStreamBody)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM