I keep seeing that Apache Spark schedules series of stages with a fixed 200 tasks involved. Since this keeps happening to a number of different jobs I am guessing this is somehow related to one of Spark configurations. Any suggestion what that configuration might be?
200 is a default number of partitions used during shuffles and it is controlled by spark.sql.shuffle.partitions
. Its value can set on runtime using SQLContext.setConf
:
sqlContext.setConf("spark.sql.shuffle.partitions", "42")
or RuntimeConfig.set
spark.conf.set("spark.sql.shuffle.partitions", 42)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.