简体繁体中英

maxOffsetsPerTrigger vs number of cores in the spark cluster

原文 2021-11-27 08:39:10 9 1 apache-spark/ apache-kafka/ databricks

For example, my spark structured streaming application has Kafka as the message source, and below are the details of the different configurations.

Kafka setup :

Message source: kafka

Partitions: 40

input parameters :

maxOffsetsPerTrigger: 1000

Cluster setup :

Number of workers = 5

Number of cores/worker = 8

Question :

With the above setup, does it read

(1000 * 5 * 8) = 40000 messages every time

(1000 * 5) = 5000 messages every time

read 1000 messages and distribute it across the 5 worker nodes?

1 answers

Per documentation :

Rate limit on maximum number of offsets processed per trigger interval. The specified total number of offsets will be proportionally split across topicPartitions of different volume .

So it's the last option in your list, and each executor at maximum will process 200 offsets per trigger, split between individual cores (25 offsets/core). But it could be smaller if you don't have enough data collected in the specific trigger period.

Also, in new versions of Spark, there are additional options, like, minOffsetsPerTrigger that will allow to process bigger batches in case if your trigger period didn't have enough data to process.

Spark: get number of cluster cores programmatically

spark : HDFS blocks vs Cluster cores vs rdd Partitions

How to achieve higher parallelization than number of cores in a spark cluster?

Apache Spark: The number of cores vs. the number of executors

Spark performance tuning - number of executors vs number for cores

Spark number of cores used

In a spark cluster, the read and writes of files are dependent on what factor, number of executors or number of cores?

Spark number of cores used by driver

Getting number of cores for EMR cluster

How to change number of cores in spark?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Spark: get number of cluster cores programmatically spark : HDFS blocks vs Cluster cores vs rdd Partitions How to achieve higher parallelization than number of cores in a spark cluster? Apache Spark: The number of cores vs. the number of executors Spark performance tuning - number of executors vs number for cores Spark number of cores used In a spark cluster, the read and writes of files are dependent on what factor, number of executors or number of cores? Spark number of cores used by driver Getting number of cores for EMR cluster How to change number of cores in spark?

Related Tags

maxOffsetsPerTrigger vs number of cores in the spark cluster

Question

1 answers

solution1 0 2021-11-27 10:36:19

solution1
0 2021-11-27 10:36:19