简体繁体中英

What is the best way to structure a spark structured streaming pipeline?

原文 2020-08-08 04:52:37 2 1 apache-spark/ spark-streaming/ spark-structured-streaming/ spark-streaming-kafka

I'm moving data from my postgres database to kafka and in the middle doing some transformations with spark. I Have 50 tables and for each table i have transformations totally different from the others. So, i want to know how is the best way to structure my spark structured streaming code. I think in three options:

To Put all the logical of read and write this 50 tables in one object and call only this object.
Create 50 different objects for each table and in a new object create a main method calling each of 50 objects and after call spark.streams.awaitAnyTermination()
Submit individually each of these 50 objects via spark submit

If exist another better option, please talk to me.

Thank you

1 answers

Creating single object as per your approach 1 does not look good. It will be difficult to understand and maintain.

Between step2 and step3, I would still prefer 3rd. Having separate jobs will be a bit of hassle to maintain (managing deployment and structuring out the common code), but if done well it will give us more flexibility. We could easily undeploy a single table if needed. Also any subsequent deployments or changes would mean deploying only the concerned table flows. The other existing table pipelines will keep working as it is.

What is the best way to consume the same topic from many different kafka brokers with spark structured streaming?

Spark structured streaming best VMs

Pipeline in Spark Structured Streaming using foreachBatch

What is the best way to restart spark streaming application?

Is there a way to dynamically stop Spark Structured Streaming?

What is the difference between Spark Structured Streaming and DStreams?

What do these metrics mean for Spark Structured Streaming?

What is the purpose of ForeachWriter in Spark Structured Streaming?

What is LocalTableScan in Spark Structure Streaming for?

What is the best way to write Optimized UDF in spark streaming application with Scala?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question What is the best way to consume the same topic from many different kafka brokers with spark structured streaming? Spark structured streaming best VMs Pipeline in Spark Structured Streaming using foreachBatch What is the best way to restart spark streaming application? Is there a way to dynamically stop Spark Structured Streaming? What is the difference between Spark Structured Streaming and DStreams? What do these metrics mean for Spark Structured Streaming? What is the purpose of ForeachWriter in Spark Structured Streaming? What is LocalTableScan in Spark Structure Streaming for? What is the best way to write Optimized UDF in spark streaming application with Scala?

Related Tags

What is the best way to structure a spark structured streaming pipeline?

Question

1 answers

solution1 1 2020-08-09 18:52:45

solution1
1 2020-08-09 18:52:45