Spark fix task number for Spark SQL jobs

Question

I keep seeing that Apache Spark schedules series of stages with a fixed 200 tasks involved. Since this keeps happening to a number of different jobs I am guessing this is somehow related to one of Spark configurations. Any suggestion what that configuration might be?

Answer 1

200 is a default number of partitions used during shuffles and it is controlled by spark.sql.shuffle.partitions . Its value can set on runtime using SQLContext.setConf :

sqlContext.setConf("spark.sql.shuffle.partitions", "42")

or RuntimeConfig.set

spark.conf.set("spark.sql.shuffle.partitions", 42)

Spark fix task number for Spark SQL jobs

Question

1 answers

solution1
2 ACCPTED 2016-08-09 13:23:15

Spark fix task number for Spark SQL jobs

Question

1 answers

solution1 2 ACCPTED 2016-08-09 13:23:15

solution1
2 ACCPTED 2016-08-09 13:23:15