简体   繁体   中英

Strategy to handle webhooks on Databricks

Context:

I'm working to handle data from different APPs to turn into some insights and visualizations.

We have around 250 third party chat apps.

We have events for each message sent/received in those chat apps.

I would like to get in my databricks all that chat info using webhooks, then process and clean data using notebooks, then make it available for people to query inside some database to plot on power bi.

My questions:

Q1) What is a good scalable practice to handle webhooks using databricks ecosystem? Are there any http endpoints available to handle that?

Q2) My thoughts of Stage 1: Getting data, Stage 2: Transforming with notebooks, Stage 3: Inserting into some database is a effective way to do this? Any suggestions?

We have events for each message sent/received in those chat apps.

as there are events already in apps the best solution always is to use streaming. For example structured streaming in spark/databricks can read from multiple sources. Easiest is when chat app is using something like Kafka but read stream from other products is also possible.

Kafka is working really good with Spark so I would strongly recommend that you put all events to Kafka and than Databricks will consume it as a stream: https://docs.databricks.com/spark/latest/structured-streaming/kafka.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM