简体   繁体   中英

Spark fix task number for Spark SQL jobs

I keep seeing that Apache Spark schedules series of stages with a fixed 200 tasks involved. Since this keeps happening to a number of different jobs I am guessing this is somehow related to one of Spark configurations. Any suggestion what that configuration might be?

200 is a default number of partitions used during shuffles and it is controlled by spark.sql.shuffle.partitions . Its value can set on runtime using SQLContext.setConf :

sqlContext.setConf("spark.sql.shuffle.partitions", "42")

or RuntimeConfig.set

spark.conf.set("spark.sql.shuffle.partitions", 42)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM